gridengine - Developer IT

Garbled UI with vnc, chicken-of-the-vnc, gridengine, qmon

- by bmargulies

Ubuntu 11.04 gridengine version 6.2u5-1ubuntu1 I start a desktop with the vncserver command. My ~/.vnc/xstartup ends with 'gnome-session'. I connect to it using chicken-of-the-VNC on my MacOSX Lion system. I run 'qmon'. Much of qmon works, but several critical tasks show hopelessly garbled grid layouts. I have filed a bug report, but I suspect that there is something I am missing that would render (ahem) them legible.

Read the article

Is there a way to tell SGE to run specific jobs as root on the execution node?

- by Rick Reynolds

The title kinda says it all... We're using SGE/OGE to submit jobs to a set of worker nodes that then do things with specific pieces of equipment. The programs and scripts that have been created that manipulate this equipment rely on running as root. I'd like SGE to handle allocation of resources in a way that is mindful of users, groups, projects, etc., but I also need the actual jobs to run with root permissions. I've read up on How can one run a prologue script as root in gridengine? to see if anything there was pertinent, but it seems that SGE is providing the "user@" kind of spec specifically for prolog and epilog kinds of actions. Is there any similar functionality for the job itself? I'm aware of su/sudo approaches, but that won't really work in this environment because the sudoers file isn't globally managed (i.e. I'd have to add a whole set of users to /etc/sudoers on lots of machines). I'm currently looking into a setuid kind of solution, but that would definitely be an unnecessary kind of work-around if SGE provides me a way to declare that a specific job (or jobs in a specific queue) always needs to run with a specific user's rights.

Read the article

SGE: invoking qmake raises "critical error: can't resolve group"

- by Pierre

I'm new to SGE an I'm trying to run qmake with the simple following Makefile with our very new cluster: merge.txt: job1.txt job2.txt job3.txt ... cat $^ > $@ job1.txt: sleep 1 echo "Hello From " $@ > $@ sleep 1 job2.txt: sleep 2 echo "Hello From " $@ > $@ sleep 2 job3.txt: (...) the following command raises an error: qmake -l arch=lx24-amd64 -cwd -v PATH -- -j 4 sleep 1 dynamic mode sleep 2 dynamic mode sleep 3 dynamic mode sleep 4 dynamic mode critical error: can't resolve group qmake: *** [job3.txt] Error 1 critical error: can't resolve group qmake: *** [job2.txt] Error 1 critical error: can't resolve group qmake: *** [job1.txt] Error 1 critical error: can't resolve group what's wrong ?

Read the article

SGE - limit a user to a certain host, using resource quota configuration

- by pufferfish

Is it possible to limit a user to a particular host, using the Resource Quota Configuration option in qmon for Sun Grid Engine? I'm thinking of a line to the effect of: { ... limit users {john} to hostname=compute-1-1.local } The documentation mentions built in resource types: slots, arch, mem_total, num_proc, swap_total, and the ability to make custom types. Details: SGE 6.1u5 on Rocks update: The above rule seems to be valid, since using an unknown hostname mangling the resource name 'hostname' both cause errors

Read the article

Parallel prologue and epilogue in Grid Engine

- by ajdecon

We have a cluster being used to run MPI jobs for a customer. Previously this cluster used Torque as the scheduler, but we are transitioning to Grid Engine 6.2u5 (for some other features). Unfortunately, we are having trouble duplicating some of our maintenance scripts in the Grid Engine environment. In Torque, we have a prologue.parallel script which is used to carry out an automated health-check on the node. If this script returns a fail condition, Torque will helpfully offline the node and re-queue the job to use a different group of nodes. In Grid Engine, however, the queue "prolog" only runs on the head node of the job. We can manually run our prologue script from the startmpi.sh initialization script, for the mpi parallel environment; but I can't figure out how to detect a fail condition and carry out the same "mark offline and requeue" procedure. Any suggestions?

Read the article

Sun Grid Engine: Automatically Terminating Idle Interactive Jobs

- by dmcer

We're considering using Sun Grid Engine on a small compute cluster. Right now, the current set up is pretty crude and just involves having people ssh to an open machine to run their jobs. We'd like to allow interactive jobs, since that should ease the transition from manually starting jobs to starting them using qsub. But, there is some concern that, if we do, people might accidentally leave their interactive sessions idle and block other jobs from being run on the machines. The issue isn't just theoretical, since we previously tried using OpenPBS and there was a problem with people opening up an interactive job in a screen session and essentially camping on a machine. Is there anyway to configure SGE to automatically kill idle interactive jobs? It looks like this was requested as an enhancement (Issue #:2447) way back in 2007. But, it doesn't seem like the request ever got implemented.

Read the article

Best way to monitor a Grid of computers?

- by marc.riera

I've installed Sun Grid Engine on 10 nodes, and one virtual master host. Now I have to monitor all the resources prior to launching it into production, but I don't know which is the best way. I've tried using xml-qstat, but it seems unstable. Any tips or suggestions? Anyone got experience on this? thanks.

Read the article

alternatives to SGE

- by bmargulies

Once upon at time there was the free SGE from Sun. tricky to install and configure, but functional and free. Now we've got: open source packages on Ubuntu that don't quite work out of the box (details on request). the actual source behind them, with a build process that depends on the c-shell and other obsolescences, available from two competing locations. a commercial packaging from Oracle a commercial package from Univa What I am really wishing for is something with the basic capabilities of this that is simple to install and maintain. Heck, I'd take a front-end to hadoop that just queues and distributes simple shell-script-defined jobs.

Read the article

Parallel Environment (PE) on Sun Grid Engine (6.2u5) won't run jobs: "only offers 0 slots"

- by Peter Van Heusden

I have Sun Grid Engine set up (version 6.2u5) on a Ubuntu 10.10 server with 8 cores. In order to be able to reserve multiple slots, I have a parallel environment (PE) set up like this: pe_name serial slots 999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $pe_slots control_slaves FALSE job_is_first_task TRUE urgency_slots min accounting_summary FALSE This is associated with the all.q on the server in question (let's call the server A). However, when I submit a job that uses 4 threads with e.g. qsub -q all.q@A -pe serial 4 mycmd.sh, it never gets scheduled, and I get the following reasoning from qstat: cannot run in PE "serial" because it only offers 0 slots Why is SGE saying "serial" only offers 0 slots, since there are 8 slots available on the server I specified (server A)? The queue in question is configured thus (server names changed): qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make orte serial rerun FALSE slots 1,[D=32],[C=8], \ [B=30],[A=8] tmpdir /tmp shell /bin/sh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt 08:00:00 s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY,[A=30g], \ [B=5g]

Read the article

Best way to monitorize a Grid of computers?

- by marc.riera

Hello, I've installed sun grid in 10 nodes, and one virtual master host. Now I have to monitorize all the resourses prior to launch it to production, but I don't know which is the best way. I've tried using xml-qstat, but it seems unstable. Any tips or suggestions? Anyone got experience on this? thanks.

Read the article

qsub: How can I find out what DRM middleware exactly is installed on a cluster?

- by gojira

I have a user account on a very big cluster. I have previous experience with Grid Engine and want to use the cluster for array jobs. The documentation tells me to use "qsub" for load balancing / submission of many jobs. Therefore I assumed this means the cluster has Grid Engine. However all my Grid Engine scripts failed to run. I checked the documentation and it is a bit weird. Now I slowly suspect that this cluster does not actually have Grid Engine, maybe it's running something called Torque (?!). The whole terminology in the man pages is a bit weird for me as a Grid Engine user, for example they talk about "bulk jobs" instead of "array jobs". There is no referral to variables on which I rely on, like SGE_TASK_ID etc. Instead they refer to variables starting with PBS_. Still, there are qsub and qstat commands. Also qsub behaves differently, apparently it is not possible to specifiy the command line parameters with bash-script comments etc. There is a documentation for the cluster system, but it does not say what the DRM middleware actually is - it refers to the entire DRM system simply as "qsub". I tried qsub --version qsub: 1.2 2010/8/17 I am not sure what I am actually running when I invoke qsub on that cluster! My question is, how can I find out if I am running Grid Engine or Torque (or whatever it is), and which version?

Read the article

Multiple servers acting like a single one with all the hardware?

- by marc.riera

Hello, by now I have 10 servers for hpc, power computing oriented. My users need to launch several processes using qmake. The users are used to work with ubuntu 9.10, and the software from the repositories is switable for them. I've deployed ubuntu 9.10 to all 10 servers (pxe rocks). By now we work with parallel-ssh and cluster-ssh, which allows as to launch the same process to all servers. With this tools this tools the servers remain as independent but with the same software and the same launched command. Now we would like to go to next step and see all the servers as a single one with all the resources from the other 9 as if was its resources. The difference would be substantial in time to process and also time to design the command to launch. Any advice on wich software to use will be very useful? Thanks

Read the article

Sun Grid Engine (SGE) Jobs Not Visible After Adding virtual_free

- by Gary Richardson

I'm trying to to use virtual_free to limit the number of large memory jobs running each grid node in my cluster. This seems to be working as expected. After I modified my code to submit jobs with the memory instances, qstat -f -q $queueName no longer shows a list of jobs waiting for a slot. The jobs are submitted with a specific queue (-q $queueName). I'm guessing this is happening due to the magic of SGE queue selection. Is there a way to make my jobs show up as before? Thanks! UPDATE I'm using: qstat -f -u * -q $queueName to view the queue. If I drop the queue argument, I can see the jobs. If I examine a specific job, I can see that it has the correct hard_queue_list value set. I'm also using Sun Grid Engine 6.1u4

Read the article

SGE: downtime planning

- by mousee

I need to plan a downtime for a maintenance of my environment (or some part of my environment) by means of Sun Grid Engine. Is it possible to somehow use backfilling information to tell the grid engine to plan only those jobs on cluster which are able to finish (i have backfilling information) till let's say 10 am next day? Can I then at 10 am rely on the fact that all compute nodes are clean, jobs are only queued, no job is planned and so that I can start maintenance? Thank you for your time. mousee

Read the article

Sun Grid Engine : jobs are not well balanced

- by GlinesMome

I use Open Grid Scheduler (a fork/copy of Sun Grid Engine). I have tried this configuration from master: # qconf -mattr exechost complex_values slots=8 slave2 # qconf -mq all.q | grep slots slots 100,[slave1=1],[slave2=8] slave1 is down, then I run 10 qsub with a sleep example (so no CPU consumption) but only 4 jobs are run at the same time on slave2 instead of I have put 8 slots. What does I missed ? PS: my goal is to provide infinite slots to force SGE to schedule only via consummable ressources.

Read the article

links for 2011-01-03

- by Bob Rhubart

Using Solaris zfs + iscsi targets with Oracle VM (Wim Coekaerts Blog) "I was playing with my Oracle VM setup and needed some shared storage that was block based. I did not have a storage array available but I did have a Solaris box, that I use for Oracle VDI, available." - Wim Coekaerts (tags: oracle otn solaris oraclevm virtualization) DanT's GridBlog: Oracle Grid Engine: Changes for a Bright Future at Oracle "Today, we are entering a new chapter in Oracle Grid Engine’s life. Oracle has been working with key members of the open source community to pass on the torch for maintaining the open source code base to the Open Grid Scheduler project hosted on SourceForge." - Dan Templeton (tags: oracle gridengine) Oracle Fusion Middleware Security: How do I secure my services? "I've been up early for a couple of days talking to a customer about how they should secure their services,' says Chris Johnson. "I'm going to tell you what I told them." (tags: oracle fusionmiddleware security) OldSpice your Innovation - Dangers of Status Quo E2.0 | Enterprise 2.0 Blogs "If organizations only leverage E2.0 technologies in a 'me too' fashion, they are essentially using a bucket to bail water from a leaking ship." - John Brunswick (tags: oracle enteprise2.0) The Aquarium: GlassFish in 2011 - What to expect A look into the Glassfish crystal ball... (tags: oracle glassfish) Andrejus Baranovskis's Blog: Fusion Middleware 11g Security - Retrieve Security Groups from ADF 11g Oracle ACE Director Andrejus Baranovskis shows you what to do when you need to access security information directly from an ADF 11g application. (tags: oracle otn fusionmiddleware security adf) @eelzinga: Book review : Oracle SOA Suite 11g R1 Developer's Guide "What I really liked in this book...was the compare/description of the Oracle Service Bus. The authors did a great job on describing functionality of components existing in the SOA Suite and how to model them in your own process." - Oracle ACE Eric ElZinga (tags: oracle oracleace soa bookreview soasuite)

Search Results

Search found 16 results on 1 pages for 'gridengine'.

Page 1/1 | 1

- by bmargulies

- by Rick Reynolds

- by Pierre

- by pufferfish

- by ajdecon

- by dmcer

- by marc.riera

- by bmargulies

- by Peter Van Heusden

- by marc.riera

- by gojira

- by marc.riera

- by Gary Richardson

- by mousee

- by GlinesMome

- by Bob Rhubart