How to setup Hadoop cluster so that it accepts mapreduce jobs from remote computers?

Posted by drasto on Super User See other posts from Super User or by drasto
Published on 2012-12-01T02:59:48Z Indexed on 2012/12/01 5:08 UTC
Read the original article Hit count: 511

Filed under:
|

There is a computer I use for Hadoop map/reduce testing. This computer runs 4 Linux virtual machines (using Oracle virtual box). Each of them has Cloudera with Hadoop (distribution c3u4) installed and serves as a node of Hadoop cluster. One of those 4 nodes is master node running namenode and jobtracker, others are slave nodes.

Normally I use this cluster from local network for testing. However when I try to access it from another network I cannot send any jobs to it. The computer running Hadoop cluster has public IP and can be reached over internet for another services. For example I am able to get HDFS (namenode) administration site and map/reduce (jobtracker) administration site (on ports 50070 and 50030 respectively) from remote network. Also it is possible to use Hue. Ports 8020 and 8021 are both allowed.

What is blocking my map/reduce job submits from reaching the cluster? Is there some setting that I must change first in order to be able to submit map/reduce jobs remotely?


Here is my mapred-site.xml file:

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>master:8021</value>
  </property>
  <!-- Enable Hue plugins -->
  <property>
    <name>mapred.jobtracker.plugins</name>
    <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
    <description>Comma-separated list of jobtracker plug-ins to be activated.
    </description>
  </property>
  <property>
    <name>jobtracker.thrift.address</name>
    <value>0.0.0.0:9290</value>
  </property>
</configuration>

And this is in /etc/hosts file:

192.168.1.15    master
192.168.1.14    slave1
192.168.1.13    slave2
192.168.1.9     slave3

© Super User or respective owner

Related posts about remote-access

Related posts about hadoop