The EASY scheduler originates from the Extensible Argonne Scheduling sYstem (EASY) developed at Argonne National Laboratory, USA. It has been modified at PDC to better suit our needs. Most of these additions have been done to make the scheduler work within the framework of our particular cluster configurations and also to take into account that we are using AFS and Kerberos. These changes should be largely transparent to the user, but some additional flags and commands have been added. Our notes on EASY are intended as a supplement to the old EASY user's guide from Argonne.
Overview of commands
module add easy
loads the EASY commands into your environment. After loading the EASY module you will be able to use the EASY commands. These are:
All EASY commands have a
-h option to get help (list their options). Some of the commands also have a
-v option that when combined with
-h give more verbose information on options.
A common pitfall is that a modified startup file (
.tcshrc) prints some kind of error, or warning, output when performing a
rsh. There should be no extra output when performing a
rsh. You could verify the correctness of your startup files:
rsh name_of_loginnode date ; date
Thu Apr 13 19:07:35 CEST 2006
Thu Apr 13 19:07:35 CEST 2006
Reserving a node
The login nodes on PDC clusters are intended for editing and compiling, All other use should be carried out on nodes under batch-system control (interactive nodes or compute nodes). Short test jobs may be executed on the interactive nodes. Reserving dedicated nodes through the batch-system gives you exclusive access to the requested resource (you are the only person allowed to login). To make a reservation for one node for 300 minutes you could using EASY you execute (on the login node):
module add easy
esubmit -n 1 -t 300
Once the reservation is effective you can log in to the reserved node, or use it by any distributed mean. Use spq and spusage to see what node that was reserved. spstatus will display a system summary.
You can also submit a single job with
esubmit -n1 -t 60 ./my_program
esubmit -hv to list options, i.e., for how to make reservations at a specific time-of-day, or how to chain reservations. Note that reserving for a particular time only quarantees that the job will not start prior to that time. In order to get the node(s), it must be your turn in the queue.
Submitting parallel jobs
To run a parallel job in batch do
esubmit -n number_of_nodes -t no_minutes my_script my_script_parameters
This will cause the script
my_script to be executed on the first node in the node list ($SP_HOSTFILE) generated by EASY. The contents of the script depend on the underlying MPI software and user software.
Note that your Kerberos tickets are copied when doing
esubmit. Make sure to have long enough tickets when submitting a job.
Examining the queue
Use the command
spq to list the jobs in the queue. Use
spq -w to see information on time-of-day, chained reservations, etc. The command
spwhen makes a prediction of when the jobs will start. The command
spfree lists the number of free nodes.
Removing a job
To remove a job from the queue, use the command
sprelease. This can be used on both running jobs and jobs waiting to start.
In order to have normal priority in the queue, you need to belong to a Charge Allocation Category (CAC). If you only belong to one CAC, it will be automatically selected when you do
spattach. To find out which CACs you belong to do
cac members $USER
To find out more about a CAC do
cac examine CAC_NAME
Please note that EASY deal with nodes, not CPUs, when allocating resources. Both Lucidor and Lenngren have two CPUs per node and thus 1 nodehour equals 2 CPU hours. When a CAC has exceed its monthly quota, PDC staff may put it in the so-called slowlane. When that happens, jobs from this CAC will have lower priority than other jobs. At special ocations PDC staff may put some CAC in fastlane. This is e.g. done during some exercises of the PDC HPC summer school. (In summary: Basically there are three priority levels: high, normal or low.)
There is no limit on the number of jobs one can submit, although we discourage excessive use. There are limits on how many/how large jobs are actively competing for nodes. In general those limits apply on a per CAC basis.
Finding queue settings
To see queue limits use
spq -l and
spq -L. (On Lucidor use spq -L -w in order to see NODETIME limits.) Adding a -s option to spstatus,
spstatus -s, to see reservations of certain runtimes for the near future.
Keeping the joblength at or under 4h increase the possibility to run them during daytime. Keeping the joblength at or under 15h increase the possibility to run them overnight, and keeping the joblength at or under 240/960 hours avoids getting competition from such jobs. Not all systems accept all joblengths.
These limits may change. Please use
spq -L to verify them.
Keeping track of time spent
Use the command
spsummary -c CAC_NAME -M
to get a monthly summary for the present month. Use the options
-t to select a time span. To see how much time each member have used remove the option
-M. It is the used time (
unode) that counts. To list all the jobs individually, use the command
spjobsummary -c CAC_NAME
You may want to use the
-w option. The time span options
-t are also available.
Keeping track of different jobs
There is presently no feature to tag diffrent jobs with names. However, information on each job is stored in a file. This includes the name of the directory from which the job was submitted. Send an e-mail to
firstname.lastname@example.org if you wan't help in finding these files.
Q1: What is the longest job I can run?
A1: This is usuallay machine dependent and may also change over time. Use the
spq -L command to find out. Example (Lucidor 2006-04-27):
h05n35> spq -L INTERVAL NICKNAME NJOB WALLTIME - ]960h,8760h] no_no_no - - - - - ]240h,960h] month - - - - - ]60h,240h] week - - - - - ]15h,60h] weekend - - - 30h01 - ]4h,15h] night - - - 30h01 - ]1h,4h] day - - - 20h01 - ]0m01s,1h] short - - - 3h - ]0m01s,1h] Nshort - - 4 -
Here we see that 60h is the max time, since there are no limits (
WALLTIME) set for longer jobs.
Q2: Why are some of my jobs in status held?
A2: One possible reason is that you and your colleagues have saturated a job class for your CAC. See the output for
spq -L for a list of job classes and
spq -l -c My_CAC for a list of your jobs divided into job classes. If the column
SATURATE is marked
saturate then your CAC is saturated in this job class. As soon as one the jobs that are not held in this job class starts or is removed, the first held job might enter the queue (not be
Q3: Why doesn't my job start even though there are nodes available and my job is first in the queue?
A3: Your job may be too long. Some nodes are set aside for shorter jobs. The output of the command
spstatus -s list these settings. Example (Lenngren 2006-05-10):
----- Space Information ----- D: 287 of 287 available for 4h jobs. D: 66 of 287 excluded for 15h jobs, [2006-05-10 13:00:00, 2006-05-10 18:00:00]. D: 196 of 287 excluded for 60h jobs, [2006-05-08 13:00:00, 2006-05-12 18:00:00]. D: 212 of 287 excluded for 240h jobs, [2006-05-07 02:00:00, 2006-05-14 02:00:00].
Here we see that 287-212=75 nodes are available for jobs longer than 60h. Furthermore, 66 nodes are only availaible for jobs shorter than 4h during the afternnon. This is done in order to increase throughput of shorter jobs. These 66 nodes are the first 66 nodes as listed by
Note that these settings may change.
Q4: I printed some files in the /scratch/ directory when running on dedicated nodes. How can I access these files after my job has finished?
A4: You can't. The /scratch/ directory is cleared when a job finish. You must transfer the files before the job terminates.
Q5: How can I see how much time my group has spent?
A5: With the spsummary command. Example:
h05n35> spsummary -c CAC_NAME -f 200601 -M /pdc/vol/easy/1.6/bin/spsummary: monthly totals year month cac usr njob uwall reqnode unode 2006 2 - - 84 323h25 5125h55 4501h47 2006 3 - - 123 382h37 6231h39 5254h11 2006 4 - - 84 469h38 3127h20 2945h45 2006 5 - - 122 768h23 8683h42 7001h18 2006 6 - - 20 114h19 884h50 671h51
It is the used time (
unode) that counts. How much time per month your project have been allocated can be found with
cac examine CAC_NAME. Example:
h05n35> cac examine CAC_NAME [..] monthly quota: 1000 [nodehours] [..]
Note: you can also do
cac -v quota CAC_NAME
Q6: Which signals are sent to the processes when the time for a batch job expires?
A6: The queueing system sends all processes of a given batch job a
SIGTERM approximately a minute before time runs out. Then the processes are sent a
Q7: None yet
Spattach aims at interactive usage for MPI or similar codes. However there is nothing that forbids a batch user to make use of it. You run your parallel or serial program either as a command or in a sub-shell when using spattach.
It all is very similar to run an ordinary Unix program.
Spattach exports information such as number of nodes and name of nodes to its sub-shell or sub-program. This makes it quite convenient to use.
Spattach is completely silent unless you tell it otherwise. Among the switches are help, number of nodes, how long to run, if to send mail, how verbose to be and a few others.
By adding the option -i you attach to the interactive pool.
spsummary displays a short user summary, which includes number of jobs started, dequeued jobs and aggregated allocation time in minutes.
spwhen gives an estimate of when a job might start. The estimate is based upon the current situation: How many nodes are considered up and running, the end-time of the jobs currently running and how the jobs waiting in line looks like.
The results of spwhen changes if a job that is ahead in line terminates sooner than expected. Consider spwhen to show a current best guess.
If there, for example, are three jobs on the machine: One small currently running, the second being a big one and the third another small one being possible to back-fill. Spwhen might evaluate that the third small one will be able to start immediately.
However, the job currently running, terminates sooner than expected. It's now the big, as in many nodes, jobs turn. Not the third one, which is not being backfilled.
In other words, spwhen is fragile.
Add the -s option to spstatus, spstatus -s, to see reservations of certain runtimes for the near future.
The following variables can be used within a batch-script submitted using esubmit:
SP_JID Job ID given by easy. SP_EASY_HOME Home directory of easy. SP_ARGS From spsubmits ``Command Line Arguments''. SP_INITIALDIR From spsubmits ``Initial directory''. SP_SUBMIT_HOST The node from which the job was submitted. SP_PROCS Number of allocated nodes. SP_NODES Allocated nodes SP_HOSTFILE The file that contains all allocated host names. There is one allocated host on each row.