Overview
PDC provides a range of storage alternatives for your data. Finding a good storage solution may be essential for the success of your application. This guide will help you to choose a storage option that suits your problem.
Know your data
To find the most suitable storage option for your data set, please consider the following aspects of your data.
- Life time - Is your data temporary, i.e. can it be deleted after your job finishes?
- Back up - Can your data be regenerated if lost, or do you require safety mechanisms to keep it safe?
- Locality - Does your data need to be accessed only by a single node (locally) or by more processes?
- Input size - What size is the input data you need?
- Output size - What amount of data do you generate? How often do you generate this data (number of consequetive or simultaneous jobs)?
- Data structure - What organization does your data have? How many files/folders? How are these organized?
- I/O pattern - How does your program access disk during its runs?
Node local disks
Process-specific temporary data, such as intermediate results, is preferably stored on local disk. Filesystems dedicated to this purpose are usually called "scratch" file systems.

