Here’s a simplified workflow of queueing jobs to the supercomputer.
For running large time consuming programs, sending the job to the queue system is preferred.
You can submit a job script to the Slurm queue system from the login node with:
By default any output messages from the job are written to the file
XXXis the job id. More information on how to create an job script can be found in Job scripts.
Note that programs should ONLY be run with sbatch, or following the instruction in Run interactively Running programs in any other way will result in the program running on the login node and not on the super computer.
You can remove your job from queue with:
Information about the jobs running in the queue can be obtained with:
You can also see your job in the queue by adding a flag:
squeue -u <username>
The state of job is listed in the ST column. The most common job state codes are:
For more job state codes please visit Slurm Job State Codes.
To get further information about your jobs:
scontrol show job <jobid>
These commands are the basic commands for submit, cancel, check jobs to the queue system.
Our clusters work a bit differently. This is pointed out in the below section. A major difference is that some cluster computing nodes DO NOT have access to AFS file system. Therefore, all files and scripts must recide in the lustre file system.
/cfs/klemming/nobackup/<1st letter username>/<username>
But it is always good practice to run any type of job in the lustre file system