Lustre

Lustre is a parallel file system optimized for handling data from many clients at the same time. This section provides the guidelines on the usage of Lustre.

Note

You can find Lustre at /cfs/klemming

Warning

Files on Lustre are NOT backed up!

Key features

  • Storage size: large volume of storage (total over 5 PB - so 100 times more than AFS)

  • File access speed: fast access (good for files accessed for computation)

  • Backup: files are not backed up

  • Accessibility: not possible to access files stored on Lustre directly via the internet - need to login to a PDC computer to get access to Lustre

  • Access from Beskow: files on Lustre can be accessed from Beskow’s compute nodes (any data or program files that you need for running programs on Beskow must be stored on Lustre)

  • Access from Tegner: files on Lustre can be accessed from Tegner’s compute nodes - so large amounts of data for Tegner computation should be stored on Lustre (small amounts of data are also okay on Lustre)

  • good for storing any large files and program code

  • File access: Lustre supports standard (POSIX) Access Control Lists

  • mainly used for:

    • project directories - shared areas to be used for input/output of running jobs - no backup - files should be moved elsewhere when they are not needed for jobs

    • cluster scratch - user areas for temporary files - no backup - old files will be deleted periodically by the system

    • nobackup directories - small user areas to be used for files that doesn’t belong to specific projects - no backup

Lustre has three parts

In Lustre file system you can find project directories and two kinds of user directories, scratch and nobackup, at /cfs/klemming/. Note that the three parts currently reside in the same file system and therefore share resources. This means that if one part gets overloaded so does the other parts too. It also means that you can move files between the different parts, with e.g. ‘mv’, rather than having to copy the data over. There is no way to recover deleted files in any of the parts.

Projects

Used for most files that are used and produced by jobs running at PDC that are not of a transient nature. Each project directory belongs to a SNIC storage allocation requested through SUPR, or to the default storage of a compute allocation. The PI will have/get full access to all data therein. If the file system permissions do not allow the PI to access requested data, PDC will change those permissions upon request from the PI. All members should be able to write to the project directory. It is up to each member, but under the responsibility of the PI, to create suitable subdirectories with the accurate permissions that are needed by the project. The amount of data and number of files that can be stored is decided by the active allocation. The project directory, and the data therein, can be inherited, or taken over by, a subsequent allocation by the same PI.

All the data in a project directory will be deleted 3 months after the project ends but the PI will be notified prior of deletion of data through their email address registered in SUPR. The project directories can be found under /cfs/klemming/projects/snic/ followed by the name chosen by the PI of the allocation.

To see what projects you belong to, run:

projinfo

Project groups

The project disk usage is recorded using a separate group per project. These groups are named pg-<big number>. To find the group name of a project directory called my_proj you can do:

ls -ld /cfs/klemming/projects/snic/my_proj

For files to be accounted to the correct allocation, they all need to belong to that group. New files should be taken care of automatically by the permissions set on the directory in which they are stored, i.e. the set-group-ID bit. When files are moved from another location, e.g. a users nobackup directory or another project’s directory, the files need to be updated manually with the new group and the user should take responsibility to update the group association directly after the files have been moved. If you have moved files into a subdirectory called new_dir somewhere in the project directory called my_proj, changing group association can be done using:

chgrp -hR --reference=/cfs/klemming/projects/snic/my_proj new_dir
find new_dir -type d -exec chmod g+s {} \;

The same can also be done simply using the following command:

fixgroup new_dir

The system will periodically make sure that all the files in a project directory are associated to the right group and that all the directories have the set-group-ID bit set. This will not be done often in order to not strain the file system.

Project quota

To see how much data and how many files are currently stored in a project directory, any member of the allocation can use the following command, where pg-XXXXXXX denotes the project group.

lfs quota -hg pg-XXXXXXX /cfs/klemming

The used space is listed under used and the number of files under files. The respective quotas are listed under limit. This information is also available in SUPR, updated on a daily basis.

If the space allocated to your project is starting to run out, you should first consider if some files are no longer needed and can be removed. To find out which subdirectories of your project are using the most space run the following command. However, please do not run it more than necessary as it can cause a lot of stress on the file system.

du -sh /cfs/klemming/projects/snic/my_proj/* | sort -hr

If you want to see how much space each member of the project is using, run the following command. Note that this can also cause stress on the file system.

find /cfs/klemming/projects/snic/my_proj -printf "%s %u\n" | awk '{arr[$2]+=$1} END {for (i in arr) {print arr[i],i}}' | numfmt --to=iec --suffix=B | sort -hr

Scratch

Use for files that are used and produced by jobs running at PDC that is of a transient nature. Examples of this could be temporary files that are only needed during a job or e.g. checkpoint files that become obsolete after the next job starts. This branch will be automatically cleaned by removing files that has not been changed within a certain time. This time will be adjusted when needed so that:

  • The files are available while a job is running on the cluster

  • After the job has run there is a reasonable chance to move the files to some other storage.

  • There is a low likelihood of jobs failing due to a full scratch area

The total size of the scratch area is 200TB, but this might change in the future depending on need/usage. Currently all files older than 30 days are eligible for being deleted during cleaning. The cleaning is usually done at least once a week, but will be done as frequently as deemed necessary to fulfill the above goals.

Your directory is located in /cfs/klemming/scratch/[1st letter of username]/[username], for example, if your username is svensson, your directory is at

/cfs/klemming/scratch/s/svensson

Nobackup

Use for files that do not belong to a specific project. This area is considered personal and PDC will not grant access to anyone if not requested by the owner. The size is limited to 25GB and 50000 files, and the data will be available at least 6 months after the user’s last allocation ended.

Note

During the transition phase, if a user is above these limits, the data will be deleted 3 months after the user’s last allocation ended.

Similar to scratch, your nobackup directory is under

/cfs/klemming/nobackup/s/svensson

You can see how much data and how many files you currently have stored in your directory using the command:

lfs quota -g $USER /cfs/klemming

Characteristics of a Lustre file system

Lustre file systems perform quite differently to local disks that are common on other machines. Lustre was developed for providing fast access to the large data files needed for large parallel applications. They are particularly bad at dealing with small files and with doing many small operations on these files and those cases should be avoided as much as possible.

Good practice on a Lustre system

To get the best performance out of a Lustre system you should use as small a number of files as possible and each time you access a file you should read/write as much data at a time as you can. An ideal program using Lustre would read in a single data file using parallel IO (e.g. MPI IO), process the data and then at the end write out a single file again using parallel IO, with no intermediate use of the disk.

Bad practice on a Lustre system

As Lustre is designed for reading a small number of large files quickly, certain IO patterns that are perfectly fine on other systems cause very high load on a Lustre system e.g.

  • Small reads

  • Opening many files

  • Seeking within a file to read a small piece of data

These practices are very common in applications that were designed to run on systems where each node has its own local scratch disk.

Many software packages (e.g. Quantum Espresso) have input options that reduce the disk IO.

Viewing and modifying access

For simple cases, the regular Unix file permission model can be used. In this model, each file and directory has an owner (user) and a group. The owner can decide whether the owner, members of the group, and others should be able to read (r), write (w) and execute (x) each file. The command ls -l displays the permissions for these respectively. For instance -rw-r----- means that the owner can read and write to the file, group members can read the file, and others are denied access to the file. The command also shows who is the owner and which is the group. For more details about how to work with file permissions see this page.

For more advanced file permission schemes, Lustre also supports POSIX ACLs (Access Control Lists). These are similar to but not the same as ACLs in the AFS file system. An ACL allows the owner of a file or directory to control access to it on a user-by-user or group-by-group basis. To view and modify an ACL, the commands getfacl and setfacl are used. Detailed documentation is available by running setfacl -h and  getfacl -h.

To check access for folder in Lustre, run the command:

getfacl -a /cfs/klemming/nobackup/u/user/test

You might see output like this:

# file: /cfs/klemming/nobackup/u/user/test
# owner: me
# group: users
user::rwx
group::r-x
other::---

Then you can grant the access to another user by

setfacl -m u:<uid>:r-x -R /cfs/klemming/nobackup/u/user

where u:<uid>:<permission> sets the access ACLs for a user. You can specify a user name or UID. The -R flag is used for recursively modifying subdirectories with the same permissions. x is needed to allow traversal through directories. So for a user to access your subdirectories they need the x permission all the way from your top directory (/cfs/klemming/nobackup/u/user).

If you want to give another user write permissions replace r-x with rwx.

Similarly, you can grant the access to another group by

setfacl -m g:<gid>:r-x -R /cfs/klemming/nobackup/u/user

where g:<gid>:<permission> sets the access ACLs for a group. You can specify group name or GID.

If you want to give another group write permissions replace r-x with rwx.

The granted permissions can be removed with the -x flag. The following command will remove all permission for another user.

setfacl -x u:<uid> -R /cfs/klemming/nobackup/u/user

More details about POSIX ACLs can be found on this page.