You are here: Home Resources Computers Historical Computers at PDC Lenngren How to Run Programs on Lenngren

How to Run Programs on Lenngren

Guide to how to compile and run programs on Lenngren.

Table of Contents

Examples also available in directory /misc/pdc/sp2/simple/

Introduction

In this document we will describe how to compile and run both a serial and a parallel program on the cluster Lenngren at PDC. You will find more information about Lenngren here.

Lenngren can be viewed as a cluster of workstations but with the difference that the network connecting the individual work stations is significantly faster than on a normal work station cluster. When a user connects to the cluster the user is connected to one of the available login nodes (work stations). On this node the user can compile his/her programs, prepare input files and all the other tasks that can be accomplished on any kind of Unix workstation. Note however that the login node is shared among all users logged into the system at the time so if a user starts to run a computing and/or I/O intensive program all login-users will suffer from the load on the login-node. Hence there may be limits on memoryuse and cputime set on this node. (To find the current setting use the command limit if your shell is tcsh or the command ulimit -a if your shell is bash.) The name of the login node is lise.pdc.kth.se.

Before running a program on Lenngren you will have to allocate a "resource" in the system to run on. I.e. you will need to "book" one or more nodes on the system to run on. You can book two types of resources:

Interactive nodes
Interactive nodes are shared pretty much in the same way as the login-node is. This means that at any one moment more than one user is running on each interactive node and also when you allocate several interactive nodes to run a parallel program you will find that you are running several instances of your job on each interactive node. I.e. the interactive nodes are used as pseudo-nodes. A five way parallel job when run on interactive nodes may actually run as 3 instances on interactive node no. 1 and two instances on interactive node no. 2. Interactive nodes are intended for short tests.
Dedicated nodes
Dedicated nodes or batch nodes are used as unique nodes in the parallel execution. On a dedicated node you are guaranteed to be the only user on the node. Commonly you will run one or two instance(s) of your parallel program on each dedicated node you have allocated giving possibility to use 100% of the computing power of each node to your program. (There are 2 CPUs per node. Note, however, that they share the same memory bus.)

Modules - Managing your environment

Adding software to your executable path is made through the use of modules.
module add i-compilers
This module adds the current version of the Intel compilers to your executable path. You need to have valid tickets when loading this module.
module add scampi
This module adds the default Scali MPI Connect implementation of MPI to your executable path
module add easy
This module adds the commands necessary to interact with the queuing system to your executable path

Running a serial (non MPI) program

A very simple serial Fortran90 example code may look like the example1.f code. As stated above, programs should normally not be run on the login node but rather on one of the interactive nodes.

 

Running a serial program on interactive nodes

To compile and execute the program on an interactive node do the following:
> module add i-compilers
> ifort -FR -o example1 -O2 example1.f
> ./example1
Number of iterations: 14000 Result: 13994.0905473407
The option -FR is needed to tell the compiler that the file is free format Fortran90. The compiler defaults to assume that files with the extension .f are fixed format Fortran90 and .f90 are free format Fortran90.

 

To find the names of the available interactive nodes do
> module add easy
> spusage | grep interactive
on the login node.

 

Running a serial program on dedicated nodes

 

To compile and submit the program do the following on the login node:
> module add i-compilers easy
> ifort -FR -o example1 -O2 example1.f
> esubmit -n1 -t5 ./example1
> spq
  Q          JID  USER     STATE    CAC          RESOURCE TIME         
  - 061609400454  ulfa     run      -                  1A 2003-06-16 21:40:00
  1 061612293392  ulfa     wait     ta.ulfa            1A 0h05           

Note that the job does not appear in the queue immediately. It takes a couple of seconds. The output of a program submitted this way appears in an e-mail. You can monitor the state of your jobs with

> watch -n10 spq -u $USER
You will there see a list of your jobs. The list will automatically refresh every 10 seconds. Stop monitoring your jobs by pressing Control-C on the keyboard.

 

 

Running an MPI program on interactive nodes

Running an MPI program using spattach -i

The example1 code used above is easily parallelized. An example is shown in example2.f.

To compile the program, connect to 3 virtual nodes and execute the program do:
> klist -Tf
Credentials cache: FILE:/tmp/krb5cc_22557
        Principal: smeds@NADA.KTH.SE
  Issued           Expires        Flags    Principal                         
Aug 17 17:48:48  Aug 18 03:48:48  FI     krbtgt/NADA.KTH.SE@NADA.KTH.SE      
[...]        
Check that you have the "F" flag set above. If not, execute the command kinit -f

> module add i-compilers scampi easy
> spattach -i -p3
> mpif77 -ccl ifort -FR -o example2 -O2 example2.f
> mpirun -np $SP_PROCS -machinefile $SP_HOSTFILE ./example2

Taking nodenames from "/tmp/SPnodes-md99-hho-0", number of nodes specified by -np
/opt/scali/bin/mpimon -stdin all  ./example2  --  d14n31.pdc.kth.se 1 d14n32.pdc.kth.se 1 d14n31.pdc.kth.se 1
 Host number:           0  (d14n31.pdc.kth.se)  Number of iterations: 4666   Result: 4644.16774464921
 Host number:           2  (d14n31.pdc.kth.se)  Number of iterations: 4666   Result: 4644.16774464921
 Host number:           1  (d14n32.pdc.kth.se)  Number of iterations: 4668   Result: 4646.47446958455
 --------------------------------
 Host number:           0                 Total number of iterations: 14000  Result: 13934.8099588830

> exit

The result differs slightly from the serial example since the sequence of random number becomes different in the two examples.

Running an MPI program on dedicated nodes

Running an MPI program using spattach

The procedure of a non-interactive spattach is almost identical to the procedure described above for interactive (shared) spattach. The main difference here is the following:

  • You are guaranteed to be the only user of the nodes.
  • You will have to wait for your nodes since dedicated nodes are allocated via the queuing system (EASY).
  • The node hours allocated will be charged to a CAC (time allocation). The time measured is wall-time. If you only belong to only one CAC, you need not specify it. To see which CACs you belong to do cac members $USER
  • You must specify the time period for which you will use the nodes. This is required for the queuing system to be able to find a time slot in the machine that fits your request.
To connect to 3 dedicated nodes and execute the program do:
> module add i-compilers scampi easy
> spattach -p3 -t15 -c MyUserCAC
> mpif77 -ccl ifort -FR -o example2 -O2 example2.f
> mpirun -np $SP_PROCS -machinefile $SP_HOSTFILE ./example2
 Host number:  1  (d03n36.pdc.kth.se)  Number of iterations: 4666   Result: 4613.1696
 Host number:  2  (d03n37.pdc.kth.se)  Number of iterations: 4666   Result: 4613.1696 
 Host number:  0  (d03n35.pdc.kth.se)  Number of iterations: 4668   Result: 4615.8663
 ----------------------------------
 Host number:  0   Total number of iterations:       14000   Result:    13842.2056029
An alternative is to use a script file to start up the SCAMPI program. A small example script myjob.sh could be designed as:
#!/bin/bash
processes_per_node=2
total_processes=`expr $processes_per_node \* $SP_PROCS`
PRG="$1"
shift
ARGS="$*"
/opt/scali/bin/mpirun -np $total_processes -npn $processes_per_node $PRG $ARGS | tee output
The program above will save your stdout (program output) to a log file named output. You then run your job with
> ./myjob.sh ./example2
/opt/scali/bin/mpimon -stdin all  ./example2  --  d03n36.pdc.kth.se 2 d03n37.pdc.kth.se 2 d03n38.pdc.kth.se 2
 Host number:           0  (d03n36.pdc.kth.se)  Number of iterations: 2333   Result: 2355.44928686096
 Host number:           1  (d03n36.pdc.kth.se)  Number of iterations: 2333   Result: 2355.44928686096
 Host number:           3  (d03n37.pdc.kth.se)  Number of iterations: 2333   Result: 2355.44928686096
 Host number:           2  (d03n37.pdc.kth.se)  Number of iterations: 2333   Result: 2355.44928686096
 Host number:           4  (d03n38.pdc.kth.se)  Number of iterations: 2333   Result: 2355.44928686096
 Host number:           5  (d03n38.pdc.kth.se)  Number of iterations: 2335   Result: 2357.66276472216
 --------------------------------
 Host number:           0   Total number of iterations:       14000   Result: 14134.9091990270
> exit
As you can see the random sequence is the same in all nodes. In a real life application this is usually not desired. Typically you either generate the whole sequence on one node, or you use a well-behaved parallel random number generator.

An important note about the procedure above is that the shell opened by the spattach runs at the node where you executed the spattach. That is typically the login node of the system. The environment in this shell however is such that parallel (MPI) programs will execute on the nodes allocated from the system.

In the example above, we used both CPUs on the nodes. It is important to know that these two CPUs share the memory bus. For some applications there is no benefit from using the second CPU, since the memory bandwidth is saturated by using one CPU. For other applications, the performance doubles by using both CPUs.

It is possible to reserve dedicated nodes in advance for interactive use:
> esubmit -n6 -t5 -T 2005-01-03/15:26:20
> spattach -j <JID>
Note that this only guarantee that you will not receive the nodes prior to the requested time. Whether you receive them at the requested time or later depend on your place in the queue.

 

Running an MPI program in batch mode

The difference between an spattach onto dedicated nodes and an esubmit is that the esubmit command is never reading any key stroke input. Instead it assumes all its input (if there is any) to come from a batch-script. In our simple example there is no input at all and running the program in batch mode is as easy as the following example:

To run the example program in batch
> module add i-compilers scampi easy
> mpif90 -FR -o example2 -O2 example2.f
> esubmit -n3 -t15 -c MyUserCAC ./myjob.sh ./example2
(You must edit your script myjob.sh to make sure the number of processes per node is correct since there is no option for the example script given above that controls the number of processes per node.
Lenngren has two CPUs per node. Hence it is not recommended to have more than two processes per node. Change your myjob.sh to fit your problem description!)
Wait for about 20-30 seconds
> spq
You will receive two letters from the EASY scheduling system for each job. One at the moment your program starts and one at the end of you run containing the output of your execution. It is vital that you have arranged for e-mail forwarding to your home institution so that you will see these mails (see the Arranging for e-mail forwarding dokument for instructions).

It is also important that at the time you submit your job you have forwardable (F) kerberos tickets valid for the entire execution of your program (i.e. queuing time + execution time). A recommended procedure is:

Recommended submit procedure
Check the life time of your current tickets
> klist -f
If tickets have short valid time: > kinit -f -l timetolive
Continue and submit the job
> esubmit -n3 -t15 ./myjob.sh ./example2
The major differences when using esubmit as to dedicated spattach are
  • The programs will be spawned from the first host among your allocated hosts and not from the login node.
  • The program does not have access to an interactive user. You can't read interactive input from a terminal and the like. However, the program has access to the STDIN device.
  • The queuing system will automatically launch your submitted script (or program). You will not need to be present to start your analysis as you need to with spattach.

Finding more information

More information in using commands to interact with the queuing system EASY is available on-line

> spattach -h
> esubmit -h
> esubmit -h -v
> spq -h
> spstatus -h
> sprelease -h
> spfree -h
> spsummary -h
> spjobsummary -h

Batch script files

Script usable for spattach -i, spattach, esubmit

In the typical case there may be several things needed to be done before actually executing your program. Perhaps you'd like to specify the communication procedure, set up some input files, move files into scratch directories and the like before your program starts. Likewise you may want to do some clean up after your program is finished.

This is done by using a batch script file that is submitted to EASY, the scheduling system.

Submit by
> esubmit -n3 -t15 -c MyUserCAC ./myjob.sh ./someprogram [arg1 arg2 ...]

For most programs this script will be enough. However in some cases you may need to make your own expanded version of this script. If you do, we recommend that you test it on the interactive nodes first. Test to run N nodes, with num processes per node with:

> spattach -i -p N
> ./myjob.sh mpiprogram [arg1 arg2 ...]

If it does what you want you may then finally do a dedicated attach (spattach -pN) and test your script as shown below. Once you get the prompt back from spattach:

  1. Check the JID of your session spq -u $USER
  2. Login on the first of the nodes you have been allotted.
  3. Do an spattach -j JID
  4. Execute your script and see if it seems to do the right thing for your application.
Filed under: , , ,