You are here: Home Resources Storage Lustre Guidelines /cfs/klemming

Guidelines /cfs/klemming

Guidelines on the usage of Klemming - PDCs high-performance file-system.

Klemming is a parallel Lustre file system that is intended for sitewide use.

There are no disk quotas enforced on Klemming, but remember Klemming is intended for temporary storage, and should not be used for long term storage. Only files needed by or recently produced by jobs running on PDCs compute resources should be on klemming.

Also NONE OF THE FILES ON KLEMMING ARE BACKED UP.

Klemming is divided into two parts

Scratch

Use for most files that are used by jobs running at PDC (but does not fall into the nobackup-category). This branch will be automatically cleaned by removing files that has not been accessed within a certain time. This time will be adjusted when needed so that: a)

  • The files are available while a job is running on the cluster
  • After the job has run there is a reasonable chance to move the files to some other storage.
  • There is a low likelihood of jobs failing due to a full file system

Nobackup

Use for files that - while needed as input by jobs frequently running on PDC - is not of a transient nature. Examples of this could be large in-data sets that are used by several jobs running over several months. In other words nobackup is a cache for frequently used data and exists to alleviate staging-problems. PDC will manually monitor the usage of this branch and it will cleaned if need arises. If frequent misuse proves it necessary, PDC can and will monitor this branch for files that more properly belongs in scratch.

Characteristics of a Lustre File system

Lustre file systems such as Klemming have perform quite differently to local disks that are common on other machines. Lustre was developed for providing fast access to the large data files needed for large parallel applications. They are particularly bad at dealing with small files and multiple files and those should be avoided as much as possible.

Good practice on a Lustre System

To get the best performance out of a Lustre system you should use as small a number of files as possible and each time you access a file you should read as much data at a time as you can. An idea program using Lustre would read in a single data file using parallel IO (e.g. MPI IO), process the data then and the end write out a single file again using parallel IO, with no intermediate use of the disk.

Bad practice on a Lustre system

As Lustre is designed for reading a small number of large files quickly certain IO patterns that are perfectly fine on other systems cause very high load on a Lustre system e.g.

  • Small reads
  • Opening many files
  • Seeking within a file to read a small piece of data

These practices are very common in applications that were designed to run on systems where each node has its own local scratch disk.

Many software packages (e.g. Quantum Espresso) have input options that reduce the disk IO

If you need help in converting your code to better use the Lustre file system contact support.