Quick Start for Dardel¶
Brief description Dardel¶
Dardel is a liquid-cooled blade-based HPE Cray EX supercomputer. Phase 1 of the Dardel installation includes the Dardel CPU partition. Each CPU node has a dual AMD EPYC Zen2 2.25 GHz 64 core processor with a total of 128 cores. The nodes are interconnected with an HPE Slingshot network using a Dragonfly topology, and a bandwidth of 100 GB/s. An open-source massively parallel distributed Lustre file system with 12 PB storage capacity is mounted to Dardel login nodes and compute nodes.
In Dardel phase 2, the GPU partition will be installed in the year 2022. The GPU nodes will be equipped with one AMD EPYC processor and four AMD instinct MI250X GPUs each.
How to log in¶
For users with experience of Beskow and Tegner, use your current configuration of Kerberos and ssh, and log in to Dardel with
kinit -f <user name>@NADA.KTH.SE ssh <user name>@dardel.pdc.kth.se
For more details on how to log in, see How to login in with SSH keys.
The user home directories and the SNIC storage project directories are stored on the Lustre file system which is mounted to the Dardel compute and login nodes.
The home directories have a quota of 25 GB.
The project directories are not backed up. You are advised to always move data, and keep additional copies of the most important input and output files.
Unlike Beskow and Tegner, there is on Dardel no AFS file system.
For more details on storage, see Klemming (on Dardel).
Transfer of files¶
Files and directories can be transferred to and from Dardel with the scp (secure copy) command.
# To transfer a file from local computer to Dardel scp localfile <user name>@dardel.pdc.kth.se:./<directory> # To transfer a file from Dardel to local computer scp <user name>@dardel.pdc.kth.se:./<directory>/<file> .
For more details on how to transfer files, see Using scp/rsync.
The Lmod module system¶
Dardel is using the Lmod environment module system. Lmod allows for dynamic add/remove of installed software packages to the running environment. To access, list and search among the installed application programs, libraries, and tools, use the module command (or the shortcut ml)
ml # lists the loaded software modules ml avail <program name> # lists the available versions of a given software
Many softwares are not directly available using the above command as they are built within different Cray programming environments. To find all software and all its dependencies.
ml spider <program name> # lists the available versions of a given software and what dependent modules need to be loaded
When you have found the program you are looking for, use
ml <program> # to load the program module
Many software modules become available after loading the latest
For more details on how to use modules, see How to use module to load different softwares into your environment.
The Cray programming environment¶
The Cray Programming Environment (CPE) provides a consistent interface to multiple compilers and libraries. On Dardel you can load the
cpe module to enable a specific version of the CPE. For example
In addition to the
cpe module, there are also the
PrgEnv- modules that provide compilers for different programming environment
PrgEnv-cray: loads the Cray compiling environment (CCE) that provides compilers for Cray systems.
PrgEnv-gnu: loads the GNU compiler suite.
PrgEnv-aocc: loads the AMD AOCC compilers.
By default the
PrgEnv-cray is loaded upon login. You can switch to different compilers by loading another
ml PrgEnv-gnu ml PrgEnv-aocc
After loading the
cpe and the
PrgEnv- modules, you can now build your parallel applications using compiler wrappers for C, C++ and Fortran
cc -o myexe.x mycode.c # cc is the wrapper for C compiler CC -o myexe.x mycode.cpp # CC is the wrapper for C++ compiler ftn -o myexe.x mycode.f90 # ftn is the wrapper for Fortran compiler
The compiler wrappers will choose the required compiler version, target architecture options, and will automatically link to the math and MPI libraries. There is no need to add any
-L flags for the Cray-provided libraries.
There are the
cray-fftwmodules that are designed to provide optimal performance from Cray systems. The
cray-libscimodule provides BLAS/LAPACK/ScaLAPACK and supports OpenMP. The number of threads can be controlled by the
There is the
cray-mpichmodule, which is based on ANL MPICH and has been optimized for Cray programming environment.
All softwares at PDC are installed using a specific CPE and the softwares installed using the latest CPE can be accessed by
PDC modules are directly related to the CPE version and a number of older software modules can also be viewed by looking at older
Build your first program¶
Example 1: Build an MPI parallelized Fortran code within the PrgEnv-cray environment
In this example we build and test run a Hello World code hello_world_mpi.f90.
program hello_world_mpi include "mpif.h" integer myrank,size,ierr call MPI_Init(ierr) call MPI_Comm_rank(MPI_COMM_WORLD,myrank,ierr) call MPI_Comm_size(MPI_COMM_WORLD,size,ierr) write(*,*) "Processor ",myrank," of ",size,": Hello World!" call MPI_Finalize(ierr) end program
The build is done within the PrgEnv-cray environment using the Cray Fortran compiler, and the testing is done on a Dardel CPU node reserved for interactive use.
# Check which compiler the compiler wrapper is pointing to ftn --version # returns Cray Fortran : Version 12.0.3 # Compile the code ftn hello_world_mpi.f90 -o hello_world_mpi.x # Test the code in interactive session. # First queue to get one reserved node for 10 minutes salloc -N 1 -t 0:10:00 -A <project name> -p main # wait for the node. Then run the program using 128 MPI ranks with srun -n 128 hello_world_mpi.x # with program output to standard out # ... # Processor 123 of 128 : Hello World # ... # Processor 47 of 128 : Hello World # ...
Having here used the ftn compiler wrapper, the linking to the cray-mpich library was done without the need to specify linking flags. As is expected for this code, in runtime each MPI rank is writing its Hello World to standard output without any synchronization with the other ranks.
Example 2: Build a C code with PrgEnv-gnu. The code requires linking to a Fourier transform library.
# Download a C program that illustrates the use of the FFTW library wget https://people.math.sc.edu/Burkardt/c_src/fftw/fftw_test.c # Change from the PrgEnv-cray to the PrgEnv-gnu environment ml swap PrgEnv-cray/8.1.0 PrgEnv-gnu/8.1.0 # Due to MODULEPATH changes, the following have been reloaded: # 1) cray-mpich/8.1.9 # Check which compiler the compiler wrapper is pointing to cc --version # gcc (GCC) 11.2.0 20210728 (Cray Inc.) ml avail # The listing reveals that cray-libsci/21.08.1.2 is already loaded. # In addition, the program needs linking also to a # Fourier transform library. ml spider fftw # gives a listing of available Fourier transform libraries. # Load a recent version of the Cray-FFTW library with ml cray-fftw/126.96.36.199 # Build the code with cc fftw_test.c -o fftw_test.x # Test the code in an interactive session. # First queue to get one reserved core for 10 minutes on # the shared partion salloc -n 1 -c 1 -t 0:10:00 -A <project name> -p shared # wait for the core. Then run the program with srun fftw_test.x
Having loaded the cray-fftw module, no additional linking flag(s) were needed for the cc compiler wrapper.
Example 3: Build a program with the EasyBuild cpeGNU/21.09 toolchain
ml EasyBuild-user # Look for a recipe for the Libxc library eb -S Libxc # Returns a list of available EasyBuild easyconfig files. # Choose an easyconfig file for the cpeGNU/21.09 toolchain. # Make a dry-run eb libxc-5.1.6-cpeGNU-21.09.eb --robot --dry-run # Check if dry-run looks reasonable. Then proceed to build with eb libxc-5.1.6-cpeGNU-21.09.eb --robot # The program is now locally installed in the user's # ~/easybuild_user directory and available with ml spider <program name> ml avail <program name>
How to use EasyBuild¶
At PDC we have EasyBuild installed to simplify the installation of HPC software and several easyconfig software recipes are available via the command line. In order to use EasyBuild in your local folder
EasyBuild installed software will build into ~/easybuild_user folder and are automatically available as modules.
For more information regarding how to EasyBuild at PDC go to Installing software using EasyBuild
Submit a batch job to the queue¶
PDC uses the Slurm Workload Manager to schedule jobs.
You are advised to always submit jobs from a directory within the project and scratch partitions of the file system.
Keep additional copies of the most important input and output files in your home directory.
The Dardel CPU nodes have 128 cores. Please note that if you request a full node, your project allocation will be charged for use of 128 cores, even if your program uses a smaller number of cores. You are advised to submit jobs that will use fewer cores than 128 to the shared partion.
For more details on the partition, see Dardel partitions.
Example 1: Submit an batch job to run on 64 cores of a node that is shared with other jobs.
In this example we will run a batch job for the hello_world_mpi.f90 code. To this end, we prepare a jobscript.sh
#!/bin/bash -l # The -l above is required to get the full environment with modules # Set the allocation to be charged for this job # not required if you have set a default allocation #SBATCH -A <project name> # The name of the script is myjob #SBATCH -J myjob # 10 minutes wall-clock time will be given to this job #SBATCH -t 00:10:00 # The partition #SBATCH -p shared # The number of tasks requested #SBATCH -n 64 # The number of cores per task #SBATCH -c 1 echo "Script initiated at `date` on `hostname`" srun -n 64 hello_world_mpi.x echo "Script finished at `date` on `hostname`"
The batch job is submitted to the job queue with the sbatch command. After submission with
The status of the job (pending in queue, running, etc) can be monitored with the squeue command.
squeue -u $USER
The standard output of the program is written to a file slurm-<job number>.out. We inspect the output
Script initiated at thu oct 28 14:52:26 CEST 2021 on nid001064 .. Processor 25 of 64 : Hello World! Processor 34 of 64 : Hello World! .. Script finished at thu oct 28 14:52:28 CEST 2021 on nid001064
Example 2: Submit a batch job to queue for a center installed software
In this example we will perform a calculation on two Dardel CPU compute nodes with the ABINIT package for modeling of condensed matter. The example calculation is a density functional theory (DFT) simulation of the properties of the material SrVO3. ABINIT is available as a PDC center installed software, as listed on the page Available Software.
We activate the ABINIT software module with
ml PDC ml ABINIT/9.6.2-cpeGNU-21.11
In order to learn more about what environment variables were set by the ml command
ml show ABINIT/9.6.2-cpeGNU-21.11 #which reveals that # /pdc/software/21.11/eb/software/ABINIT/9.6.2-cpeGNU-21.11/bin # was appended to the PATH
In order to set up the simulation for SrVO3 we need an abi input file and a set of pseudopotentials for the chemical elements. These are contained in the ABINIT 9.6.2 release
Download and extract the ABINIT 9.6.2 release
wget https://www.abinit.org/sites/default/files/packages/abinit-9.6.2.tar.gz tar xf abinit-9.6.2.tar.gz # where the files and directories needed for this example are abinit-9.6.2/tests/tutoparal/Input/tdmft_1.abi abinit-9.6.2/tests/Psps_for_tests/
In order to run the calculation as a batch job on two nodes, prepare a jobscriptabinit.sh where the your-project-account should be an active compute project, and Psps_for_tests should be the path to the pseudopotentials.
#!/bin/bash -l # time allocation #SBATCH -A <your-project-account> # name of this job #SBATCH -J abinit-job # wall time for this job #SBATCH -t 00:30:00 # number of nodes #SBATCH --nodes=2 # The partition #SBATCH -p main # number of MPI processes per node #SBATCH --ntasks-per-node=256 ml PDC ml ABINIT/9.6.2-cpeGNU-21.11 export ABI_PSPDIR=<Psps_for_tests> srun -n 256 abinit tdmft_1.abi > out.log
The batch job is submitted to the job queue with the sbatch command
The status of the job (pending in queue, running, etc) can be monitored with the squeue command.
squeue -u $USER
The standard output of the program was directed to the file out.log. We inspect the last 20 lines of the output
tail -n 20 out.log # which prints --- !FinalSummary program: abinit version: 9.6.2 start_datetime: Fri Nov 26 11:32:57 2021 end_datetime: Fri Nov 26 11:33:07 2021 overall_cpu_time: 2489.9 overall_wall_time: 2632.7 exit_requested_by_user: no timelimit: 0 pseudos: V : e583d1cc132dd79ce204b31204bd83ed Sr : 02b29cc3441fa9ed5e1433b119e79fbc O : c8ba4c11dba269a1224b8b74498fed92 usepaw: 1 mpi_procs: 256 omp_threads: 1 num_warnings: 1 num_comments: 0 ...
Exercise The final summary states one warning. Search in out.log for warning messages. What do they indicate on the matching of hardware requested for the job, and the problem size?
More details on this particular example can be found in the ABINIT tutorial on DFT+DMFT.