Klemming (on Dardel)

Klemming is the center storage system at PDC. It uses the Lustre parallel file system, which is optimized for handling data from many clients at the same time. The total size of Klemming is 12 PB (12,000 TB). Note that only the home area is backed up.

Storage areas

Klemming is divided into three main storage areas shown in the table below. Replace u/username by the initial letter of your username and then the full username, and projectname by the name of the project.

Area

Path

Alias

Size

File count

Backup

Home

/cfs/klemming/home/u/username

$SNIC_BACKUP

25 GB

100 K

Yes

Projects

/cfs/klemming/projects/snic/projectname

-

Varies

Varies

No

Scratch

/cfs/klemming/scratch/u/username

$SNIC_TMP

Unlimited

Unlimited

No

Use the projinfo command to show the status of your projects and quotas.

Home area

Use for files that do not belong to a specific project. This area is considered personal and PDC will not grant access to anyone if not requested by the owner. The size is limited to 25 GB and 100,000 files. The data will be available at least 6 months after the user’s last allocation ended.

Projects area

Used for most non-temporary files that are read and written by jobs running at PDC. Each project directory belongs to a SNIC storage allocation requested through SUPR, or to the default storage of a compute allocation. The amount of data and number of files that can be stored is decided by the active allocation. Once the allocation ends, the project directory can be inherited by a subsequent allocation by the same PI.

The PI has full access to all data in the project directory. If the file system permissions do not allow the PI to access requested data, PDC will change those permissions upon request from the PI. All members should be able to write to the project directory. It is up to each member, but under the responsibility of the PI, to create suitable subdirectories with the accurate permissions that are needed by the project.

All the data in a project directory will be deleted 3 months after the project ends. The PI will be notified prior of deletion of data through their email address registered in SUPR.

If the space allocated to your project is starting to run out, you should first consider if some files are no longer needed and can be removed. To find out which subdirectories of your project are using the most space run the following command. However, please do not run it more than necessary as it can cause a lot of stress on the file system.

du -sh /cfs/klemming/projects/snic/my_proj/* | sort -hr

If you want to see how much space each member of the project is using, run the following command. Note that this can also cause stress on the file system.

find /cfs/klemming/projects/snic/my_proj -printf "%s %u\n" | awk '{arr[$2]+=$1} END {for (i in arr) {print arr[i],i}}' | numfmt --to=iec --suffix=B | sort -hr

Scratch area

Use for temporary files that are read and written by jobs running at PDC. Examples of this could be files that are only needed during a job or checkpoint files that become obsolete after the next job starts.

The scratch area is automatically cleaned by removing files that have not been changed in 30 days.

Performance considerations

Lustre file systems perform quite differently to local disks that are common on other machines. Lustre was developed for providing fast access to the large data files needed for large parallel applications. They are particularly bad at dealing with small files and with doing many small operations on these files and those cases should be avoided as much as possible.

Good practice on a Lustre system

To get the best performance out of a Lustre system you should use as small a number of files as possible and each time you access a file you should read/write as much data at a time as you can. An ideal program using Lustre would read in a single data file using parallel IO (e.g. MPI IO), process the data and then at the end write out a single file again using parallel IO, with no intermediate use of the disk.

If the software is using large files, it can be beneficial to stripe them across several file servers. A common pattern is to use the following command to stripe files in the output directory across as many servers as possible:

lfs setstripe -c -1 output/

More information about striping is available on the Lustre wiki: https://wiki.lustre.org/Configuring_Lustre_File_Striping.

Bad practice on a Lustre system

As Lustre is designed for reading a small number of large files quickly, certain IO patterns that are perfectly fine on other systems cause very high load on a Lustre system e.g.

  • Small reads

  • Opening many files

  • Seeking within a file to read a small piece of data

These practices are very common in applications that were designed to run on systems where each node has its own local scratch disk. Many software packages (e.g. Quantum Espresso) have input options that reduce the disk IO.

General best practices

In addition to these guidelines, general storage best practices should be followed:

  • Minimize the number of I/O operations, since larger input/output (I/O) operations are more efficient than small ones. If possible reads/writes should be aggregated into larger blocks.

  • Avoid creating too many files, since post-processing a large number of files can be hard on the file system.

  • Avoid creating directories with a large numbers of files. Instead create directory hierarchies, which also improves interactiveness.

Managing access permissions

By default, only you can access the files in your home and scratch directories, and only project members can access their project directory. But sometimes it is useful to change these defaults.

Basic Unix permissions

Each file and directory has an owner (user) and a group. The owner can decide whether the owner, members of the group, and others should be able to read (r), write (w) and execute (x) each file. The command ls -l displays the permissions for these respectively. For instance -rw-r----- means that the owner can read and write to the file, group members can read the file, and others are denied access to the file. The command also shows who is the owner and which is the group. For more details about how to work with file permissions see this page.

Access Control Lists

For more advanced use cases, Lustre also supports POSIX ACLs (Access Control Lists). An ACL allows the owner of a file or directory to control access to it on a user-by-user or group-by-group basis. To view and modify an ACL, the commands getfacl and setfacl are used. Detailed documentation is available by running setfacl -h and  getfacl -h.

To view the access for a folder in Lustre, run the command:

getfacl -a /cfs/klemming/home/u/user/test

You might see output like this:

# file: /cfs/klemming/home/u/user/test
# owner: me
# group: users
user::rwx
group::r-x
other::---

Then you can grant the access to another user by

setfacl -m u:<uid>:r-x -R /cfs/klemming/home/u/user/test

where u:<uid>:<permission> sets the access ACLs for a user. You can specify a user name or UID. The -R flag is used for recursively modifying subdirectories with the same permissions. x is needed to allow traversal through directories. So for a user to access your subdirectories they need the x permission all the way from your top directory (/cfs/klemming/home/u/user).

If you want to give another user write permissions replace r-x with rwx.

Similarly, you can grant the access to another group by

setfacl -m g:<gid>:r-x -R /cfs/klemming/home/u/user/test

where g:<gid>:<permission> sets the access for a group. You can specify group name or GID.

If you want to give another group write permissions replace r-x with rwx.

The granted permissions can be removed with the -x flag. The following command will remove all permission for another user.

setfacl -x u:<uid> -R /cfs/klemming/home/u/user/test

More details about POSIX ACLs can be found on this page.