You are here: Home Resources Computers Historical Computers at PDC Hebb Quick Start Guide

Quick Start Guide

An introduction to using the Blue Gene/L system at PDC.

Intro

To use the system you prepare your job on the login node, called hebb.pdc.kth.se, and then submit your job through LoadLeveler to run on the Blue Gene/L. Everything your program needs to run, executable and input/output files, has to reside in the GPFS filesystem accessible on the login node as eg. /gpfs/scratch/t/testuser. Note that this is a scratch filesystem, so there are no backups in case of hardware failures or user mistakes. To store files safely, you can use AFS or HSM as usual at PDC. After logging in you should do module add bgl to setup a suitable environment.

Compiling

There are two sets of compilers installed; GNU and IBM XL. The XL compilers are generally prefered if the code doesn't use GNU specific things. The easiest way to use the compilers are through the mpi* scripts, called:

XL

GNU

Fortran 77

mpixlf77

mpif77

Fortran 90

mpixlf90

C

mpixlc

mpicc

C++

mpixlcxx

mpicxx

These scripts compile for the right architecture, and also link with the right MPI library automatically. You can find more information about compilers and libraries here.

Running

To submit a job through LoadLeveler you have to write a job control file describing your job. A simple one might look like this:

  # My first job control file
  #
  # @ job_name            = test-job-1
  # @ job_type            = bluegene
  # @ comment             = "First small test job"
  # @ error               = $(job_name).$(jobid).err
  # @ output              = $(job_name).$(jobid).out
  # @ environment         = COPY_ALL;
  # @ wall_clock_limit    = 00:20:00
  # @ notification        = always
  # @ bg_size             = 32
  # @ bg_connection       = mesh
  # @ queue

  mpirun -cwd /gpfs/scratch/t/testuser -verbose 2 ./hello_world

This will run hello_world on 32 compute nodes and send the output to a file called test-job-1.$(jobid).out. It will also send a lot of info to test-job-1.$(jobid).err about what it's doing. This info might be helpful for debugging, if things doesn't work as expected. To send the job to the queue you use llsubmit together with the name of the job control file. You will then get a job ID back. Using llq you can see what jobs are queuing and running, and with llcancel <job ID> you can kill one of your jobs. You can find more information about LoadLeveler and mpirun here.

Filed under: ,