[PDC - Center for Parallel Computers, KTH]

MPI Lab Exercises: Virtual Topologies

Jacobi iteration on a two-dimensional domain

Write a Jacobi iteration code for a general two-dimensional computational domain, i.e., not necessarily square. Vary the grid size in the two dimensions independently from 10 to 100 and 1000 and measure the performance.

You will be given the code for a one-dimensional decomposition (decomposed along the y-axis) including

Your task is to produce the code for a general two-dimensional decomposition using MPI and then time the execution time on at least a 12 node configuration organized as a 1 x 12, 2 x 6, 3 x 4, processor grid.

Instructions

The source using a one-dimensional solution is at

Copy the file of your choice and make sure that the file is writable (chmod u+w my_twod.f90).

Compile and link the code with mpif77 -ccl ifort -FR on Lennren,

The procedure with the C-version is similar.

During development you may run the code on interactive nodes. Get access to interactive nodes by doing:

spattach -p12 -i

The parameters to the programs are provided via a terminal (Fortran version) or as command line arguments (C version). The C version contains a usage description in the comments of the source files.

Note: On Lenngren you need to hit Enter twice after giving the input values.

Remember to check that your new code version converges in the same number of steps as the original code.

When you have a working code you need to compile with optimization (e.g. -O3) before doing timing for the performance measurements. To get consistent timing you need to run as single user on the nodes, i.e., you need to use batch nodes.

On Lenngren you get batch nodes with

        esubmit -n12 -t10 -c MyUserCAC ./scali.sh
where the script file scali.sh contains these two lines:
#!/bin/bash
/opt/scali/bin/mpirun -np $SP_PROCS ./my_code < input > output
where the file input contains the value for nx and any other variables that your code wants to read. Another option is to edit your code to make sure that it does not ask for input. Your job is then placed in the batch queue. Check your place in the queue with the command spq.

Extra assignment: Make sure that your code writes data in the file oned.out in the same order as the original code. A sample result from the C-version id oned.cout.

Hint: To get shorter execution times during development you can decrease the tolerance. This will give convergence in fewer time steps.

Note: See Section 4.1 in Using MPI 2nd edition by Gropp, Lusk and Ewing for a more detailed description of the numerical method. Figure 4.1 on page 71 corresponds to nx=ny=7. Note that the domain decomposition in the code is not exactly the same as the one drawn in Figure 4.3 on page 72.

Three different solutions are available twod.f90, twod_v2.f90 and twod_v3.f90. On Lenngren, it was twod_v2.f that was the most efficient solution. The corresponding C-versions can be found at twod.c and twod_v3.c. Since C lacks the construct of matrix slices, the version twod.c is comparable to twod_v2.f90. More comments on the solutions are available on this webpage. A performance model is given here.


Changed by:$Author: hanke $,$Date: 2008/08/21 09:29:44 $