HPC Server Dynamic Job Scheduling: when jobs spawn jobs

Posted by JoshReuben on Geeks with Blogs See other posts from Geeks with Blogs or by JoshReuben
Published on Wed, 10 Oct 2012 11:34:14 GMT Indexed on 2012/10/10 15:38 UTC
Read the original article Hit count: 824

Filed under:

HPC Job Types

HPC has 3 types of jobs http://technet.microsoft.com/en-us/library/cc972750(v=ws.10).aspx

· Task Flow – vanilla sequence

· Parametric Sweep – concurrently run multiple instances of the same program, each with a different work unit input

· MPI – message passing between master & slave tasks

But when you try go outside the box – job tasks that spawn jobs, blocking the parent task – you run the risk of resource starvation, deadlocks, and recursive, non-converging or exponential blow-up.

The solution to this is to write some performance monitoring and job scheduling code. You can do this in 2 ways:

manually control scheduling - allocate/ de-allocate resources, change job priorities, pause & resume tasks , restrict long running tasks to specific compute clusters
Semi-automatically - set threshold params for scheduling.

How – Control Job Scheduling

In order to manage the tasks and resources that are associated with a job, you will need to access the ISchedulerJob interface - http://msdn.microsoft.com/en-us/library/microsoft.hpc.scheduler.ischedulerjob_members(v=vs.85).aspx

This really allows you to control how a job is run – you can access & tweak the following features:

max / min resource values
whether job resources can grow / shrink, and whether jobs can be pre-empted, whether the job is exclusive per node
the creator process id & the job pool
timestamp of job creation & completion
job priority, hold time & run time limit
Re-queue count
Job progress
Max/ min Number of cores, nodes, sockets, RAM
Dynamic task list – can add / cancel jobs on the fly
Job counters

When – poll perf counters

Tweaking the job scheduler should be done on the basis of resource utilization according to PerfMon counters – HPC exposes 2 Perf objects: Compute Clusters, Compute Nodes

http://technet.microsoft.com/en-us/library/cc720058(v=ws.10).aspx

You can monitor running jobs according to dynamic thresholds – use your own discretion:

Percentage processor time
Number of running jobs
Number of running tasks
Total number of processors
Number of processors in use
Number of processors idle
Number of serial tasks
Number of parallel tasks

Design Your algorithms correctly

Finally , don’t assume you have unlimited compute resources in your cluster – design your algorithms with the following factors in mind:

· Branching factor - http://en.wikipedia.org/wiki/Branching_factor - dynamically optimize the number of children per node

· cutoffs to prevent explosions - http://en.wikipedia.org/wiki/Limit_of_a_sequence - not all functions converge after n attempts. You also need a threshold of good enough, diminishing returns

· heuristic shortcuts - http://en.wikipedia.org/wiki/Heuristic - sometimes an exhaustive search is impractical and short cuts are suitable

· Pruning http://en.wikipedia.org/wiki/Pruning_(algorithm) – remove / de-prioritize unnecessary tree branches

· avoid local minima / maxima - http://en.wikipedia.org/wiki/Local_minima - sometimes an algorithm cant converge because it gets stuck in a local saddle – try simulated annealing, hill climbing or genetic algorithms to get out of these ruts

watch out for rounding errors – http://en.wikipedia.org/wiki/Round-off_error - multiple iterations can in parallel can quickly amplify & blow up your algo ! Use an epsilon, avoid floating point errors, truncations, approximations

Happy Coding !

Developer IT