Queueing jobs

Here’s a simplified workflow of queueing jobs to the supercomputer.

https://pdc-web.eecs.kth.se/files/support/images/sbatchflow.PNG

For running large time consuming programs, sending the job to the queue system is preferred.

  • You can submit a job script to the Slurm queue system from the login node with:

    sbatch <filename>
    

    By default any output messages from the job are written to the file slurm-XXX.out where XXX is the job id. More information on how to create an job script can be found in Job scripts.

Warning

Note that programs should ONLY be run with sbatch, or following the instruction in Run interactively Running programs in any other way will result in the program running on the login node and not on the super computer.

  • You can remove your job from queue with:

    scancel <jobid>
    
  • Information about the jobs running in the queue can be obtained with:

    squeue
    
  • You can also see your job in the queue by adding a flag:

    squeue -u <username>
    

    The state of job is listed in the ST column. The most common job state codes are:

    • R: Running

    • PD: Pending

    • CG: Completing

    • CA: Cancelled

    For more job state codes please visit Slurm Job State Codes.

  • To get further information about your jobs:

    scontrol show job <jobid>
    

These commands are the basic commands for submit, cancel, check jobs to the queue system.

Note

Our clusters work a bit differently. This is pointed out in the below section. A major difference is that some cluster computing nodes DO NOT have access to AFS file system. Therefore, all files and scripts must recide in the lustre file system.

/cfs/klemming/nobackup/<1st letter username>/<username>

But it is always good practice to run any type of job in the lustre file system