Know your data
To find the most suitable storage option for your data set, please consider the following aspects of your data.
- Life time - Is your data temporary, i.e. can it be deleted after your job finishes?
- Back up - Can your data be regenerated if lost, or do you require safety mechanisms to keep it safe?
- Locality - Does your data need to be accessed only by a single node (locally) or by more processes?
- Input size - What size is the input data you need?
- Output size - What amount of data do you generate? How often do you generate this data (number of consequetive or simultaneous jobs)?
- Data structure - What organization does your data have? How many files/folders? How are these organized?
- I/O pattern - How does your program access disk during its runs?
Node local disks
In the PDC environment, scratch file systems are available on all major machines and are mounted as:
Note that scratch file systems are uniqe to every node, and no data is shared between nodes. If scratch files must be shared, some type of parallel file system has to be used.
The capacity of scratch file systems vary with system size. Use scratch file systems in favor of other storage alternatives whenever appropriate. Scratch disk are accessed much like ordinary temporary filesystems. Scratch disks on batch nodes are cleared after each completed job to ensure that the specified space is always available. On scratch areas with interactive users there is no automatic cleanup procedure; users are responsible for deleting their own material.
PDC may delete files from interactive scratch areas at any time to free space. Further, no backups are taken of any scratch disk. To summarize, scratch disk is:
- good for process-specific temporary data
- usually the fastest available secondary storage medium
- not a good solution for permanent storage
User home directories are in AFS
The intention is that all PDC users should have their home directories in the distributed filesystem AFS. Initially new users get a quota of ~500 Mbyte. This can be raised upon request to 5GB.
Home directories are backed up and AFS backup is provided by the native AFS backup system. Typically, each user's home directory is a single volume. For the first level of backup, AFS creates a copy of each volume each night. This copy is called a "backup volume". You can find this backup volume in the ~/OldFiles directory in each home directory. If you have just deleted or damaged a file that existed a day before, type "cd ~/OldFiles" to find a version of it from the previous day and copy it back into your home directory.
Also the backup (using TSM) runs each night and backs up all home volumes and project volumes that have explicitly asked to be backed up. If you need to restore deleted or damaged file that existed before contact PDC Support.
Project volumes in AFS
If the space provided by home directory does not fulfills users needs, users can request additional AFS space. For performance reasons and impact on infrastructure we are trying to limit each created volume tol 50 Gbyte of backed up storage. Use this form to request more AFS storage.