How to Run Jobs

Many researchers run their program on PDC’s computer systems, often simultaneously. For this, the computer systems need a workload management and job scheduling. For job scheduling PDC uses Slurm Workload Manager .

When you login to the supercomputer with ssh, you will login to a designated login node in your afs home directory. Here you can modify your scripts and manage your file.

https://pdc-web.eecs.kth.se/files/support/images/LoginNodeWarning.PNG

To run your script/program on the computer nodes, you can do it in one of the following ways.

How jobs are scheduled

The queue system uses two main methods to decide which jobs are run. These are called fair-share and backfill. Unlike some other centers, the time a job has been in the queue is not a factor.

Fair-share

The goal of the fair share algorithm is to make sure that all projects can use their fair share of the available resources within a reasonable time frame. The priority that a job (belonging to a particular project) is given will depend on how much of that project’s time quota has been used recently in relation to the quotas of jobs belonging to other projects - the effect of this on the priority declines gradually with a half-life of 14 days. So jobs submitted by projects that have not used much of their quota recently will be given high priority, and vice versa.

Backfill

As well as having a main queue to ensure that the systems are as full as possible, the job scheduling system also implements “backfill”. If the next job in the queue is large (that is, it will need lots of nodes to run), the scheduler collects nodes as they become free until there are enough to start running the large job. Backfill means that the scheduler looks for smaller jobs that could start on nodes that are free now, and which would finish before there are enough nodes free for the large job to start. For backfill to work well, the scheduler needs to know how long jobs will take. So, to take advantage of the possibility of backfill, you should set the maximum time your job needs to run as accurately as possible in your submit scripts.

Beskow usage

This graph shows the percentage of the nodes on Beskow that were in use on different dates from early 2015 till late 2016. You can see how the scheduler makes good use of Beskow as nearly all of the available nodes are being used all the time.

Note: All researchers sharing a particular time allocation have the same priority. This means that if other people in your time project have used up lots of the allocated time recently, then any jobs you (or they) submit within that project will be given the same low priority.

Example of scheduling

https://pdc-web.eecs.kth.se/files/support/images/ExampleScheduling.png

Of course both Anna and Björn would like their jobs to be run as soon as possible.

However, in the current situation, the scheduler will give priority to Björn’s job as his project (B) has not used as much of its time allocation recently as project A has used of their allocation.

The fact that Anna has not used any time herself does not make any difference as it is the total amount of time recently used by each project that is taken into consideration when deciding which job will be scheduled next.

How to submit jobs

Jobs can be submitted to PDC clusters in several ways. Both by sending jobs to the job queue or by booking and running a node interactively.

Accessing Software

At PDC, there is a variety of machines, operating systems, projects and different versions of the same code that all have their special set of programs to run. This means that different users or different programs might have newer/older dependencies or preference of a specific software.

To be able to maintain and use this large set of programs we use Modules. Below we present the needed commands to use these modules to your advantage.

Available software

You can find all the software that is available on the different machines at Available Software at https://www.pdc.kth.se/software

In the software page you can also find what versions are available. Each software has instructions on how to load and use it. If you don’t find a specific software or a specific version, you can either (temporary) create it in your cfs directory or Contact PDC and ask us to install it.

[More about Licence and policy etc should also be here?]

Using modules

The HPC Cluster hosts a large and extensive set of software. We are using the Environment Modules Package for short to keep the software organized and it enables the user to pick and choose what software or version of the software they wish to use. For example, when there are several versions available of the same software package, you can select which version to use with a simple load command.

You can load/add a module from your terminal with the following command.

module add <software>/[version]

It is important to know that if you omit the [version], the latest version of the software will be loaded into your environment. That could create problems if new versions are being installed on our system, and they will be loaded rather than the version you are currently using. So it is good practice to include the version of the software as well. This ensures that your jobs are consistent when the default module changes.

the Programname can be found using

module avail

and the available versions can be found using

module avail <software>

You can check the currently loaded modules with

module list

If you have an module you do not want, you can unload it with

module unload <software>

and you can also swap modules with

module swap <current software module> <software module to load>

Make sure you haven’t loaded multiple version of one program or forgot to load dependencies for certain software.