Quick Start Guide
Intro
To use the system you prepare your job on the login node, called hebb.pdc.kth.se, and then submit your job through LoadLeveler to run on the Blue Gene/L. Everything your program needs to run, executable and input/output files, has to reside in the GPFS filesystem accessible on the login node as eg. /gpfs/scratch/t/testuser. Note that this is a scratch filesystem, so there are no backups in case of hardware failures or user mistakes. To store files safely, you can use AFS or HSM as usual at PDC. After logging in you should do module add bgl to setup a suitable environment.
Compiling
There are two sets of compilers installed; GNU and IBM XL. The XL compilers are generally prefered if the code doesn't use GNU specific things. The easiest way to use the compilers are through the mpi* scripts, called:
XL |
GNU |
|
|---|---|---|
Fortran 77 |
mpixlf77 |
mpif77 |
Fortran 90 |
mpixlf90 |
|
C |
mpixlc |
mpicc |
C++ |
mpixlcxx |
mpicxx |
These scripts compile for the right architecture, and also link with the right MPI library automatically. You can find more information about compilers and libraries here.
Running
To submit a job through LoadLeveler you have to write a job control file describing your job. A simple one might look like this:
# My first job control file # # @ job_name = test-job-1 # @ job_type = bluegene # @ comment = "First small test job" # @ error = $(job_name).$(jobid).err # @ output = $(job_name).$(jobid).out # @ environment = COPY_ALL; # @ wall_clock_limit = 00:20:00 # @ notification = always # @ bg_size = 32 # @ bg_connection = mesh # @ queue mpirun -cwd /gpfs/scratch/t/testuser -verbose 2 ./hello_world
This will run hello_world on 32 compute nodes and send the output to a file called test-job-1.$(jobid).out. It will also send a lot of info to test-job-1.$(jobid).err about what it's doing. This info might be helpful for debugging, if things doesn't work as expected. To send the job to the queue you use llsubmit together with the name of the job control file. You will then get a job ID back. Using llq you can see what jobs are queuing and running, and with llcancel <job ID> you can kill one of your jobs. You can find more information about LoadLeveler and mpirun here.


