Implications of Multiple JobTracker nodes in a Hadoop cluster?

Posted by Jim Dennis on Server Fault See other posts from Server Fault or by Jim Dennis
Published on 2012-08-28T18:33:08Z Indexed on 2012/08/30 21:40 UTC
Read the original article Hit count: 219

Filed under:
|
|

I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration files (conventionally under /etc/hadoop/conf/ --- at least for the Cloudera Distribution of Hadoop (CDH). Can we define multiple Job Trackers in mapred-site.xml? Something like:

<configuration>
   <property>
     <name>mapred.job.tracker</name>
     <value>jt01.mydomain.not:8021</value>
   </property>
   <property>
     <name>mapred.job.tracker</name>
     <value>jt02.mydomain.not:8021</value>
   </property>
...
</configuration>

Or is there some other allowed syntax for this?

What are the implications of doing this. Does each JobTracker get information about the load on each TaskTracker node. In other words can the two JobTracker co-ordinated their scheduling across the TT nodes only based on the gossip information from the TTs or would they need to talk to one another?

Is this documented anywhere?

© Server Fault or respective owner

Related posts about configuration

Related posts about hadoop