You are here: Home Resources Computers Historical Computers at PDC Lucidor II How to Run Programs

Run Programs

Compile and run programs on Lucidor.

Table of Contents

Examples available in directory /misc/pdc/sp2/simple/

Introduction

In this document we will describe how to compile and run both a serial and a parallel program on the IPF, Itanium Processor Family, cluster Lucidor at PDC.

Lucidor can be viewed as a cluster of workstations but with the difference that the network connecting the individual work stations is significantly faster than on a normal work station cluster. When a user connects to the cluster the user is connected to one of the available login nodes (work stations). On this node the user can compile his/her programs, prepare input files and all the other tasks that can be accomplished on any kind of Unix workstation. Note however that the login node is shared among all users logged into the system at the time so if a user starts to run a computing and/or I/O intensive program all login-users will suffer from the load on the login-node. Hence there may be limits on memoryuse and cputime set on this node. (To find the current setting use the command limit if your shell is tcsh or the command ulimit -a if your shell is bash.) The name of the login node is blumino.pdc.kth.se.

Before running a program on Lucidor you will have to allocate a "resource" in the system to run on. I.e. you will need to "book" one or more nodes on the system to run on. You can book two types of resources:

Interactive nodes
Interactive nodes are shared pretty much in the same way as the login-node is. This means that at any one moment more than one user is running on each interactive node and also when you allocate several interactive nodes to run a parallel program you will find that you are running several instances of your job on each interactive node. I.e. the interactive nodes are used as pseudo-nodes. A five way parallel job when run on interactive nodes may actually run as 3 instances on interactive node no. 1 and two instances on interactive node no. 2. Interactive nodes are intended for short tests.
Dedicated nodes
Dedicated nodes or batch nodes are used as unique nodes in the parallel execution. On a dedicated node you are guaranteed to be the only user on the node. Commonly you will run one or two instance(s) of your parallel program on each dedicated node you have allocated giving possibility to use 100% of the computing power of each node to your program. (There are 4 CPUs per node. Note, however, that they share the same memory bus.)

Modules - Managing your environment

Adding software to your executable path is made through the use of modules.
module add i-compilers
This module adds the current version of the Intel compilers to your executable path. You need to have valid tickets when loading this module.
module add mpi
This module adds the current default MPI implementation to your executable path. As of 2008-06-16, it is mpichmx/1.2.7..5-intel.
module add easy
This module adds the commands necessary to interact with the queuing system to your executable path

Running a serial (non MPI) program

A very simple serial Fortran90 example code may look like the example1.f code. As stated above, programs should normally not be run on the login node but rather on one of the interactive nodes.

Running a serial program on interactive nodes

To compile and execute the program on an interactive node do the following:
> module add i-compilers
> ifort -FR -o example1 -O2 example1.f
> ./example1
Number of iterations: 14000 Result: 13994.0905473407
The option -FR is needed to tell the compiler that the file is free format Fortran90. The compiler defaults to assume that files with the extension .f are fixed format Fortran90 and .f90 are free format Fortran90.
To find the names of the available interactive nodes do
> module add easy
> spusage | grep interactive
on the login node.

Running a serial program on dedicated nodes

To compile and submit the program do the following on the login node:
> module add i-compilers easy
> ifort -FR -o example1 -O2 example1.f
(Note that if you do not have module add i-compilers in a suitable login file, you need to submit a script which first load the module i-compilers and then run the program.)
> esubmit -n1 -t5 ./example1
> spq
  Q          JID  USER     STATE    CAC          RESOURCE TIME         
  - 061609400454  ulfa     run      -                  1A 2003-06-16 21:40:00
  1 061612293392  ulfa     wait     ta.ulfa            1A 0h05           

Note that the job does not appear in the queue immediately. It takes a couple of seconds. The output of a program submitted this way appears in an e-mail. You can monitor the state of your jobs with

> watch -n10 spq -u $USER
You will there see a list of your jobs. The list will automatically refresh every 10 seconds. Stop monitoring your jobs by pressing Control-C on the keyboard.

Running an MPI program on interactive nodes

Running an MPI program using spattach -i

The example1 code used above is easily parallelized. An example is shown in example2.f.

To compile the program, connect to 3 virtual nodes and execute the program do:
> klist -Tf
Credentials cache: FILE:/tmp/krb5cc_22557
        Principal: smeds@NADA.KTH.SE
  Issued           Expires        Flags    Principal                         
Aug 17 17:48:48  Aug 18 03:48:48  FI     krbtgt/NADA.KTH.SE@NADA.KTH.SE      
[...]        
Check that you have the "F" flag set above. If not, execute the command kinit -f

> module add i-compilers mpi easy
> spattach -i -p3
> mpif90 -FR -o example2 -O2 example2.f
> mpirun -nolocal -np $SP_PROCS -machinefile $SP_HOSTFILE ./example2

 Host number:  0  (h01n07-e.pdc.kth.se)  Number of iterations:  4668   Result: 4615.8663 
 Host number:  2  (h01n07-e.pdc.kth.se)  Number of iterations:  4666   Result: 4613.1696
 Host number:  1  (h01n07-e.pdc.kth.se)  Number of iterations:  4666   Result: 4613.1696
 --------------------------------
 Host number:  0  Total number of iterations: 14000  Result: 13842.2056029682

> exit
The result differs slightly from the serial example since the sequence of random number becomes different in the two examples.

An example of a batch script file for a MPICH program is /afs/pdc.kth.se/misc/pdc/mpich/mpich.lxl used above. The file is located in AFS on PDC systems. This script will spawn num number of MPI processes per node. The name of the MPI program and its arguments are stated on the submit line.

Submit by
> esubmit -n3 -t15 -c MyUserCAC ./mpich.lxl [-p num] mpiprogram [arg1 arg2 ...]

For most programs this script will be enough. However in some cases you may need to make your own expanded version of this script. If you do, we recommend that you test it on the interactive nodes first. Test to run N nodes, with num processes per node with:

> spattach -i -p N
> ./my_mpich.lxl [-p num] mpiprogram [arg1 arg2 ...]

If it does what you want you may then finally do a dedicated attach (spattach -pN) and test your script as shown below. Once you get the prompt back from spattach:

  1. Check the JID of your session spq -u $USER
  2. Login on the first of the nodes you have been allotted.
  3. Do an spattach -j JID
  4. Execute your script and see if it seems to do the right thing for your application.
Filed under: , ,