FairScheduling Conventions in Hadoop
- by dan.mcclary
While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't 
discovered or utilized it in their initial investigations of the Hadoop ecosystem.  We could chalk this up to many things:
 
     
      Organizations are still determining what their dataflow and analysis workloads will comprise 
      Small deployments under tests aren't likely to show the signs 
of strains that would send someone looking for resource allocation 
options 
      The default scheduling options -- the FairScheduler and the 
CapacityScheduler -- are not placed in the most prominent position 
within the Hadoop documentation. 
     
      
   
   
    However, for production deployments, it's wise to start with at 
least the foundations of scheduling in place so that you can tune 
 the cluster as workloads emerge.  To do that, we have to ask ourselves 
something about what the off-the-rack scheduling options are.  We have 
some choices: 
     
      The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis. 
      The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis. 
      Writing your own implementation of the abstract class 
org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually 
overkill. 
     
      
   
  
If you're going to have several concurrent users and leverage the more 
interactive aspects of the Hadoop environment 
(e.g. Pig and Hive scripting), the FairScheduler is definitely the way 
to go.  
In particular, we can do user-specific pools so that default users get 
their fair share, and specific users are given the resources their 
workloads require.
 
   
    To enable fair scheduling, we're going to need to do a couple of things.  
First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. 
 We do this by adding the following to the 
 mapred-site.xml file in HADOOP_HOME/conf: 
      
        <property> 
        <name>mapred.jobtracker.taskScheduler</name> 
        <value>org.apache.hadoop.mapred.FairScheduler</value> 
        </property>  
        <property> 
        <name>mapred.fairscheduler.allocation.file</name> 
        <value>/path/to/allocations.xml</value> 
        </property>  
        <property> 
        <name>mapred.fairscheduler.poolnameproperty</name> 
        <value>pool.name</value> 
        </property>  
        <property> 
        <name>pool.name</name> 
        <value>${user.name}</name> 
        </property>   
     
      
  What we've done here is simply tell the JobTracker that we'd like to 
task scheduling to use the FairScheduler class rather than a single FIFO
 queue.
  Moreover, we're going to be defining our resource pools and 
allocations in a file called allocations.xml
  
  For reference, the allocation file is read every 15s or so, which 
allows for tuning allocations without having to take down the 
JobTracker.
   
     
     
      
  Our allocation file is now going to look a little like this
   
        
          
<?xml version="1.0"?> 
 
          
<allocations>
 
          
  <pool name="dan">
  
            
            
            
            
            
            
    <minMaps>5</minMaps> 
            
    <minReduces>5</minReduces>
     
            
    <maxMaps>25</maxMaps>
     
            
    <maxReduces>25</maxReduces>
     
            
    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
     
            
  </pool>
 
            
  <mapreduce.job.user.name="dan">
              
              
              
              
              
              
    <maxRunningJobs>6</maxRunningJobs>
     
              
  </user>
 
              
  <userMaxJobsDefault>3</userMaxJobsDefault>
   
              
  <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
  
              
</allocations>
 
             
             
     
     
      
  In this case, I've explicitly set my username to have upper and lower 
bounds on the maps and reduces, and allotted myself double the number of
 running jobs.
    Now, if I run hive or pig jobs from either the console or via the 
Hue web interface, I'll be treated "fairly" by the JobTracker. 
    There's a lot more tweaking that can be done to the allocations 
file, so it's best to dig down into the
     description  
     and start trying out allocations that might fit your workload.