You are here: Home Resources Software Job scheduler EASY

EASY

Start learning about the job scheduler EASY.

Introduction

The EASY scheduler originates from the Extensible Argonne Scheduling sYstem (EASY) developed at Argonne National Laboratory, USA. It has been modified at PDC to better suit our needs. Most of these additions have been done to make the scheduler work within the framework of our particular cluster configurations and also to take into account that we are using AFS and Kerberos. These changes should be largely transparent to the user, but some additional flags and commands have been added. Our notes on EASY are intended as a supplement to the old EASY user's guide from Argonne.

The scheduler EASY is used on all Linux clusters at PDC apart from Lindgren which uses Torque and the Hebb which uses Load Leveler.

Overview of commands

The command:

module add easy

loads the EASY commands into your environment. After loading the EASY module you will be able to use the EASY commands. These are:

esubmit
spattach

spq

spwhen

spstatus

spfree

sprelease

spsummary

spjobsummary

cac

getjid

All EASY commands have a -h option to get help (list their options). Some of the commands also have a -v option that when combined with -h give more verbose information on options.

Important note

A common pitfall is that a modified startup file (.basrc or .cshrc or .profile or .tcshrc) prints some kind of error, or warning, output when performing a rsh. There should be no extra output when performing a rsh. You could verify the correctness of your startup files:

rsh name_of_loginnode date ; date

Thu Apr 13 19:07:35 CEST 2006

Thu Apr 13 19:07:35 CEST 2006

Reserving a node

The login nodes on PDC clusters are intended for editing and compiling, All other use should be carried out on nodes under batch-system control (interactive nodes or compute nodes). Short test jobs may be executed on the interactive nodes. Reserving dedicated nodes through the batch-system gives you exclusive access to the requested resource (you are the only person allowed to login). To make a reservation for one node for 300 minutes you could using EASY you execute (on the login node):

module add easy
esubmit -n 1 -t 300

Once the reservation is effective you can log in to the reserved node, or use it by any distributed mean. Use spq and spusage to see what node that was reserved. spstatus will display a system summary.

You can also submit a single job with

esubmit -n1 -t 60 ./my_program

Use esubmit -hv to list options, i.e., for how to make reservations at a specific time-of-day, or how to chain reservations. Note that reserving for a particular time only quarantees that the job will not start prior to that time. In order to get the node(s), it must be your turn in the queue.

Submitting parallel jobs

To run a parallel job in batch do

esubmit -n number_of_nodes -t no_minutes my_script my_script_parameters

This will cause the script my_script to be executed on the first node in the node list ($SP_HOSTFILE) generated by EASY. The contents of the script depend on the underlying MPI software and user software.

Note that your Kerberos tickets are copied when doing esubmit. Make sure to have long enough tickets when submitting a job.

Examining the queue

Use the command spq to list the jobs in the queue. Use spq -w to see information on time-of-day, chained reservations, etc. The command spwhen makes a prediction of when the jobs will start. The command spfree lists the number of free nodes.

Removing a job

To remove a job from the queue, use the command sprelease. This can be used on both running jobs and jobs waiting to start.

CACs

In order to have normal priority in the queue, you need to belong to a Charge Allocation Category (CAC). If you only belong to one CAC, it will be automatically selected when you do esubmit or spattach. To find out which CACs you belong to do

cac members $USER

To find out more about a CAC do

cac examine CAC_NAME

Please note that EASY deal with nodes, not CPUs, when allocating resources. Both Lucidor and Lenngren have two CPUs per node and thus 1 nodehour equals 2 CPU hours. When a CAC has exceed its monthly quota, PDC staff may put it in the so-called slowlane. When that happens, jobs from this CAC will have lower priority than other jobs. At special ocations PDC staff may put some CAC in fastlane. This is e.g. done during some exercises of the PDC HPC summer school. (In summary: Basically there are three priority levels: high, normal or low.)

There is no limit on the number of jobs one can submit, although we discourage excessive use. There are limits on how many/how large jobs are actively competing for nodes. In general those limits apply on a per CAC basis.

Finding queue settings

To see queue limits use spq -l and spq -L. (On Lucidor use spq -L -w in order to see NODETIME limits.) Adding a -s option to spstatus, spstatus -s, to see reservations of certain runtimes for the near future.

Keeping the joblength at or under 4h increase the possibility to run them during daytime. Keeping the joblength at or under 15h increase the possibility to run them overnight, and keeping the joblength at or under 240/960 hours avoids getting competition from such jobs. Not all systems accept all joblengths.

These limits may change. Please use spq -L to verify them.

Keeping track of time spent

Use the command

spsummary -c CAC_NAME -M

to get a monthly summary for the present month. Use the options -f and -t to select a time span. To see how much time each member have used remove the option -M. It is the used time (unode) that counts. To list all the jobs individually, use the command

spjobsummary -c CAC_NAME

 

You may want to use the -w option. The time span options -f and -t are also available.

Keeping track of different jobs

There is presently no feature to tag diffrent jobs with names. However, information on each job is stored in a file. This includes the name of the directory from which the job was submitted. Send an e-mail to support@pdc.kth.se if you wan't help in finding these files.

FAQ

Q1: What is the longest job I can run?

A1: This is usuallay machine dependent and may also change over time. Use the spq -L command to find out. Example (Lucidor 2006-04-27):

               h05n35> spq -L
                       INTERVAL  NICKNAME                           NJOB WALLTIME
                 - ]960h,8760h]  no_no_no -        -                   - -
                 -  ]240h,960h]  month    -        -                   - -
                 -   ]60h,240h]  week     -        -                   - -
                 -    ]15h,60h]  weekend  -        -                   - 30h01
                 -     ]4h,15h]  night    -        -                   - 30h01
                 -      ]1h,4h]  day      -        -                   - 20h01
                 -   ]0m01s,1h]  short    -        -                   - 3h
                 -   ]0m01s,1h]  Nshort   -        -                   4 -

 

Here we see that 60h is the max time, since there are no limits (NJOB and WALLTIME) set for longer jobs.

Q2: Why are some of my jobs in status held?

A2: One possible reason is that you and your colleagues have saturated a job class for your CAC. See the output for spq -L for a list of job classes and spq -l -c My_CAC for a list of your jobs divided into job classes. If the column SATURATE is marked saturate then your CAC is saturated in this job class. As soon as one the jobs that are not held in this job class starts or is removed, the first held job might enter the queue (not be held anymore).

Q3: Why doesn't my job start even though there are nodes available and my job is first in the queue?

A3: Your job may be too long. Some nodes are set aside for shorter jobs. The output of the command spstatus -s list these settings. Example (Lenngren 2006-05-10):

               ----- Space Information -----

               D: 287 of 287 available for   4h jobs.
               D: 66 of 287  excluded for  15h jobs, [2006-05-10 13:00:00, 2006-05-10 18:00:00].
               D: 196 of 287  excluded for  60h jobs, [2006-05-08 13:00:00, 2006-05-12 18:00:00].
               D: 212 of 287  excluded for 240h jobs, [2006-05-07 02:00:00, 2006-05-14 02:00:00].

 

Here we see that 287-212=75 nodes are available for jobs longer than 60h. Furthermore, 66 nodes are only availaible for jobs shorter than 4h during the afternnon. This is done in order to increase throughput of shorter jobs. These 66 nodes are the first 66 nodes as listed by spusage.

Note that these settings may change.

Q4: I printed some files in the /scratch/ directory when running on dedicated nodes. How can I access these files after my job has finished?

A4: You can't. The /scratch/ directory is cleared when a job finish. You must transfer the files before the job terminates.

Q5: How can I see how much time my group has spent?

A5: With the spsummary command. Example:

               h05n35> spsummary -c CAC_NAME -f 200601 -M

               /pdc/vol/easy/1.6/bin/spsummary: monthly totals

               year month            cac       usr njob     uwall   reqnode     unode

               2006     2              -         -   84    323h25   5125h55   4501h47
               2006     3              -         -  123    382h37   6231h39   5254h11
               2006     4              -         -   84    469h38   3127h20   2945h45
               2006     5              -         -  122    768h23   8683h42   7001h18
               2006     6              -         -   20    114h19    884h50    671h51

 

It is the used time (unode) that counts. How much time per month your project have been allocated can be found with cac examine CAC_NAME. Example:

               h05n35> cac examine CAC_NAME
                         [..]
                                  monthly quota: 1000 [nodehours]
                         [..]

 

Note: you can also do cac -v quota CAC_NAME

Q6: Which signals are sent to the processes when the time for a batch job expires?

A6: The queueing system sends all processes of a given batch job a SIGTERM approximately a minute before time runs out. Then the processes are sent a SIGKILL.

Q7: None yet

spattach

Spattach aims at interactive usage for MPI or similar codes. However there is nothing that forbids a batch user to make use of it. You run your parallel or serial program either as a command or in a sub-shell when using spattach.

It all is very similar to run an ordinary Unix program.

Spattach exports information such as number of nodes and name of nodes to its sub-shell or sub-program. This makes it quite convenient to use.

Spattach is completely silent unless you tell it otherwise. Among the switches are help, number of nodes, how long to run, if to send mail, how verbose to be and a few others.

By adding the option -i you attach to the interactive pool.

spsummary

spsummary displays a short user summary, which includes number of jobs started, dequeued jobs and aggregated allocation time in minutes.

spwhen

spwhen gives an estimate of when a job might start. The estimate is based upon the current situation: How many nodes are considered up and running, the end-time of the jobs currently running and how the jobs waiting in line looks like.

The results of spwhen changes if a job that is ahead in line terminates sooner than expected. Consider spwhen to show a current best guess.

If there, for example, are three jobs on the machine: One small currently running, the second being a big one and the third another small one being possible to back-fill. Spwhen might evaluate that the third small one will be able to start immediately.

However, the job currently running, terminates sooner than expected. It's now the big, as in many nodes, jobs turn. Not the third one, which is not being backfilled.

In other words, spwhen is fragile.

spstatus

Add the -s option to spstatus, spstatus -s, to see reservations of certain runtimes for the near future.

Environment variables

The following variables can be used within a batch-script submitted using esubmit:

SP_JID		 Job ID given by easy.
SP_EASY_HOME   Home directory of easy.
SP_ARGS	 From spsubmits ``Command Line Arguments''.
SP_INITIALDIR	 From spsubmits ``Initial directory''.
SP_SUBMIT_HOST  The node from which the job was submitted.
SP_PROCS	  Number of allocated nodes.
SP_NODES	 Allocated nodes
SP_HOSTFILE	 The file that contains all allocated
               host names. There is one allocated host
                on each row.