You are here: Home Resources Storage Choosing storage

Overview

PDC provides a range of storage alternatives for your data. Finding a good storage solution may be essential for the success of your application. This guide will help you to choose a storage option that suits your problem.

Know your data

To find the most suitable storage option for your data set, please consider the following aspects of your data.

  • Life time - Is your data temporary, i.e. can it be deleted after your job finishes?
  • Back up - Can your data be regenerated if lost, or do you require safety mechanisms to keep it safe?
  • Locality - Does your data need to be accessed only by a single node (locally) or by more processes?
  • Input size - What size is the input data you need?
  • Output size - What amount of data do you generate? How often do you generate this data (number of consequetive or simultaneous jobs)?
  • Data structure - What organization does your data have? How many files/folders? How are these organized?
  • I/O pattern - How does your program access disk during its runs?

Node local disks

Process-specific temporary data, such as intermediate results, is preferably stored on local disk. Filesystems dedicated to this purpose are usually called "scratch" file systems.

In the PDC environment, scratch file systems are available on all major machines and are mounted as:

/scratch

Note that scratch file systems are uniqe to every node, and no data is shared between nodes. If scratch files must be shared, some type of parallel file system has to be used.

The capacity of scratch file systems vary with system size. Use scratch file systems in favor of other storage alternatives whenever appropriate. Scratch disk are accessed much like ordinary temporary filesystems. Scratch disks on batch nodes are cleared after each completed job to ensure that the specified space is always available. On scratch areas with interactive users there is no automatic cleanup procedure; users are responsible for deleting their own material.

PDC may delete files from interactive scratch areas at any time to free space. Further, no backups are taken of any scratch disk. To summarize, scratch disk is:

  • good for process-specific temporary data
  • usually the fastest available secondary storage medium
  • not a good solution for permanent storage

User home directories are in AFS

The intention is that all PDC users should have their home directories in the distributed filesystem AFS. Initially new users get a quota of ~500 Mbyte. This can be raised upon request to 5GB.

Home directories are backed up and AFS backup is provided by the native AFS backup system. Typically, each user's home directory is a single volume. For the first level of backup, AFS creates a copy of each volume each night. This copy is called a "backup volume". You can find this backup volume in the ~/OldFiles directory in each home directory. If you have just deleted or damaged a file that existed a day before, type "cd ~/OldFiles" to find a version of it from the previous day and copy it back into your home directory.

Also the backup (using TSM) runs each night and backs up all home volumes and project volumes that have explicitly asked to be backed up. If you need to restore deleted or damaged file that existed before contact PDC Support.

Project volumes in AFS

If the space provided by home directory does not fulfills users needs, users can request additional AFS space. For performance reasons and impact on infrastructure we are trying to limit each created volume tol 50 Gbyte of backed up storage. Use this form to request more AFS storage.