GT Home : : Campus Maps : : GT Directory

Archive for December, 2012

Profiling tools available: PAPI and TAU

Posted by on Friday, 7 December, 2012

The Performance API (PAPI) and TAU are two of the most common open source profiling tools, and they are now available for PACE users, including support for hardware counters and threading.

PAPI description, from their website:

The PAPI project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. PAPI provides two interfaces to the underlying counter hardware; a simple, high level interface for the acquisition of simple measurements and a fully programmable, low level interface directed towards users with more sophisticated needs.

TAU description, from their website:

TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. This tool is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements.


TAU tool uses PAPI for event collection and provides two tools for visualization. The text based tool is called pprof and the graphical tool is called paraprof.

A *very* short guide to using TAU on PACE clusters

* First, you need to recompile your code with TAU wrappers.

  • Load the modules your code needs (compiler, MPI, etc)
module load gcc/4.4.5 mvapich2/1.6
  • Load the latest tau module (currently tau/2.22-p1, older versions are known to have bugs)
module load tau/2.22-p1

(This will load PDT and PAPI modules too, if you don’t have them loaded already)

  • The TAU module will set the correct TAU Makefile in your environment. Check if you have it right:

• Compile your code using one of the compiler wrapper scripts.

E.g., for a f90 code: -L${PAPIDIR}/lib -lpfm loop_test.f90 -o loop_test

Note that “-L${PAPIDIR}/lib -lpfm” part is necessary on PACE clusters to avoid the system default libpfm, which is not compatible with TAU. If you don’t specify this, you will get this warning:

Error: Reverting to a Regular Make
To suppress this message and revert automatically, please add -optRevert to your TAU_OPTIONS environment variable
Press Enter to continue

* Run the code as usual (not on the headnode!!) 

 mpirun -np 4 ./loop_test

 You will see profiler files in the format “profile.A.B.C” in the same folder, which indicates TAU ran and collected profiling data

* Finally, run pprof or paraprof from the same directory to see the results!

    • pprof -ea   (sort by exclusive time and show all details)
    • paraprof

Remember, these are very brief instructions. Please refer to PAPI and TAU documentation for more details:

PAPI Reference

TAU User Guide








New and Updated Software: Java, MUMPS, SCOTCH, ParMETIS, OpenFOAM, trf, CUDA, lagan, MPJ Express, R, Wireshark, Sharktools

Posted by on Thursday, 6 December, 2012

We have lots of updated software this time.
I’ve been putting off an update for other reasons, and now we have a lot to cover.
Remember that all of this software is available through the “modules” system installed on all PACE-managed Redhat Enterprise 6 computers.
For basic usage instructions on PACE systems see the Using Software Modules page.

Java 7

Here is a brief summary of the enhancements included with the Java 7 release:

  • Improved performance, stability and security.
  • Enhancements in the Java Plug-in for Rich Internet Applications development and deployment.
  • Java Programming language enhancements that enable developers with ease of writing and optimizing the Java code.
  • Enhancements in the Java Virtual machine to support Non-Java languages.

There are a large number of enhancements in JDK 7.
See the JDK 7 website for more information.

Using it

$ module avail java 

$ module load java/1.7.0
#Checking that you are using the right version
$ which java
$ which javac

Note: The java/1.7.0 module adds “.” to the CLASSPATH environment variable.
If you don’t know what that means, see the wikipedia page.

Scotch and PT-Scotch 5.1.12

Scotch is a software package and set of libraries for sequential and parallel graph partitioning, static mapping and clustering, sequential mesh and hypergraph partitioning, and sequential and parallel sparse matrix block ordering.

Using it

#First load a compiler - almost any compiler will work: 
$ module load gcc/4.6.2
#Load an MPI distribution - any of them should work:
$ module load openmpi/1.4.3
#Compile an application using the ptscotch library:
$ mpicc mpi_application.c ${LDFLAGS} -lptscotch

ParMETIS 3.2.0 and 4.0.2

ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes developed in our lab.

ParMETIS provides the following five major functions:

  • Graph Partitioning
  • Mesh Partitioning
  • Graph Repartitioning
  • Partitioning Refinement
  • Matrix Reordering

Using it

#First load a compiler - almost any compiler will work: 
$ module load intel/12.1.4
#Load an MPI distribution - any of them should work:
$ module load mvapich2/1.6
#Compile an application using the parmetis library:
$ mpicc mpi_application.c ${LDFLAGS} -lparmetis -lmetis

MUMPS 4.10.0

MUMPS is a (MU)ltifrontal (M)assively (P)arallel sparse direct (S)olver.
Main Features:

  • Solution of large linear systems with symmetric positive definite matrices; general symmetric matrices; general unsymmetric matrices;
  • Version for complex arithmetic;
  • Parallel factorization and solve phases (uniprocessor version also available);
  • Iterative refinement and backward error analysis;
  • Various matrix input formats assembled format; distributed assembled format; elemental format;
  • Partial factorization and Schur complement matrix (centralized or 2D block-cyclic);
  • Interfaces to MUMPS: Fortran, C, Matlab and Scilab;
  • Several orderings interfaced: AMD, AMF, PORD, METIS, PARMETIS, SCOTCH, PT-SCOTCH.

Using it

#First load a compiler - almost any compiler will work: 
$ module load gcc/4.6.2
#Load an MPI distribution - any of them should work:
$ module load openmpi/1.4.3
# Load the rest of the prerequisites (other solvers and libraries)
$ module load mkl/10.3 scotch/5.1.12 parmetis/3.2.0
#Compile your application and link against the correct mumps library:
$ mpicc mpi_application.c ${LDFLAGS} -lcmumps

OpenFOAM 2.1.x

OpenFOAM is a free, open source CFD software package developed by OpenCFD Ltd at ESI Group and distributed by the OpenFOAM Foundation . It has a large user base across most areas of engineering and science, from both commercial and academic organisations. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics.

Using it

#Unload any compiler and MPI modules you may have loaded: 
$ module list
pgi/12.3 openmpi/1.5.4 acml/5.2.0 #pgi/12.3 and openmpi/1.5.4 are just examples.
$ module rm openmpi/1.5.4 pgi/12.3
# Load the openfoam module
$ module load openfoam/2.1.x
ERROR: The directory ~/scratch/OpenFOAM/2.1.x must exist
OpenFOAM module not loading
execute "mkdir -p ~/scratch/OpenFOAM/2.1.x" to create this directory
#Oops - the openfoam module requires that we have a particular directory for openfoam to work with.
$ mkdir -p ~/scratch/OpenFOAM/2.1.x
#Now load the openfoam module again
$ module load openfoam/2.1.x
#Test that openfoam is OK
$ foamInstallationTest
#If this command succeeded, everything is OK.
#Testing openfoam
$ cd ~/scratch/OpenFOAM/2.1.x
$ cp -r ${FOAM_TUTORIALS}/tutorials/basic .
$ cd basic/laplacianFoam/flange/
$ ./Allclean
$ ./Allrun
ansysToFoam: converting mesh flange.ans
Running laplacianFoam on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToFieldview9 on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToEnsight on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToVTK on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange

trf (Tandem Repeats Finder)

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Using it

$ module load trf/4.07b 
$ trf

CUDA 5.0.35

CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Using it

$ module load cuda/5.0.35 
#Use nvcc to compile a CUDA application
$ nvcc application.cpp


LAGAN toolkit is a set of tools for local, global, and multiple alignment of DNA sequences.

Using it

#Load a compiler module 
$ module load gcc/4.7.2
#Load the lagan module
$ module load lagan/2.0

MPJ Express

MPJ Express is an open source Java message passing library that allows application developers to write and execute parallel applications for multicore processors and compute clusters/clouds.

Using it

#MPJ needs to store log files and cannot do so in the system-install location. 
#We need to create a place for it to put log data.
$ mkdir ~/mpj/logs
$ module load mpj/0.38
#Inside a job script:
$ mpjboot machinefile
$ ... application.jar
$ mpjhalt machinefile

R 2.15.2

R is a free software environment for statistical computing and graphics.

Using it

$ module load R/2.15.2 
$ R

Wireshark 1.4.15, 1.6.12, 1.8.4

Wireshark is the world’s foremost network protocol analyzer. It lets you capture and interactively browse the traffic running on a computer network. It is the de facto (and often de jure) standard across many industries and educational institutions.

Using it

$ module load wireshark/1.8.4 
$ wireshark


Sharktools is a Matlab and Python frontend to wireshark.

Using it

#Load the necessary prerequisites 
$ module load wireshark/1.4.15 matlab/r2011b python/2.7.2
#Load sharktools
$ module load sharktools/0.15
# python
>>> import pyshark