Parallel Environment (PE) on Sun Grid Engine (6.2u5) won't run jobs: "only offers 0 slots"

Posted by Peter Van Heusden on Server Fault See other posts from Server Fault or by Peter Van Heusden
Published on 2011-07-05T12:37:01Z Indexed on 2011/11/24 10:02 UTC
Read the original article Hit count: 425

Filed under:
|

I have Sun Grid Engine set up (version 6.2u5) on a Ubuntu 10.10 server with 8 cores. In order to be able to reserve multiple slots, I have a parallel environment (PE) set up like this:

pe_name            serial
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

This is associated with the all.q on the server in question (let's call the server A). However, when I submit a job that uses 4 threads with e.g. qsub -q all.q@A -pe serial 4 mycmd.sh, it never gets scheduled, and I get the following reasoning from qstat:

cannot run in PE "serial" because it only offers 0 slots

Why is SGE saying "serial" only offers 0 slots, since there are 8 slots available on the server I specified (server A)?

The queue in question is configured thus (server names changed):

qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make orte serial
rerun                 FALSE
slots                 1,[D=32],[C=8], \
              [B=30],[A=8]
tmpdir                /tmp
shell                 /bin/sh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  08:00:00
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY,[A=30g], \
              [B=5g]

© Server Fault or respective owner

Related posts about gridengine

Related posts about sge