Brief Introduction to Profiling
A very brief introduction to profiling and optimization of parallel (MPI) codes.
Rule one of code optimization
Rule one of code optimization says "Don't do it yourself unless you have to". In other words, use existing libraries (BLAS, LAPACK, ARPACK, FFTW, netCDF, METIS, SCALAPACK etc.) whenever possible. If no suitable package exist, we recommend the following for optimizing your code.Procedure
-
Optimize a serial version of the code. Use these tools:
-
Profilers (
gprofetc.) to find the bottlenecks of your code. - Performance counters to find out the performance (GFlop/s) of your code. You can also get cache hit rates etc. to investigate why performance is poor.
You may also experiment with different compiler optimization options.
-
Profilers (
-
Use a trace tool (
Vampir,jumpshot) ormpiPwhich is a lightweight profiling library for MPI applications to evaluate your parallel performance.
Further information
See the PDC software page for a list of available tools.gprof is
available on all PDC computers. (This refers to older versions of the compiler: However, it works poorly on Lucidor, we
recommend using hpcprof from HPCToolkit instead.)
On Lucidor and Lenngren you can do
module show perftoolsto get a list of installed and recommended performance tools. They include papiex, HPCtoolkit, mpiP and Jumpshot.
A good book on the subject is:
High Performance Computing, 2nd edition;
by Kevin Dowd & Charles Severance; O'Reilly, 1998.
You should also consider attending the PDC Introduction to High-Performce Computing summer school, to learn more about writing efficient HPC codes.
Technical details
How to use the tools mentioned above differs from computer to computer. However, here are some general guidelines.-
The profiler
gprof:-
Compile with function profiling turned on. This option is
usually
-pg. (On the Intel compilers it is-qp.) -
Execute the code. A file called
gmon.outwill appear. -
Create a profile and redirect the result into a file with:
gprof ./myprog gmon.out > gprof.txt
-
Compile with function profiling turned on. This option is
usually
-
Performance counters. Unless
papi is installed on the computer in question,
you need to figure out the computer specific way to do this.
If
papiexis installed, you can measure GFlop/s withpapiex -e PAPI_FP_OPS -e PAPI_TOT_CYC -- ./myprog
-
Trace tools and
mpiP. Typically you need to relink your MPI code to instrument it with wrappers around the MPI calls. Withjumpshotyou link with-mpilog.


