CRAY T3E Performance Tools
Last change: 2001-Apr-03
This page describes performance tools for the CRAY T3E and their
installation / configuration at the
John von Neumann Institute
for Computing (NIC) /
Central Institute for
Applied Mathematics (ZAM) of the
Research Centre Juelich.
PAT / VAMPIR
For the CRAY T3E, Silicon Graphics/Cray Research implemented and provides
two performance analysis tools, Apprentice and PAT. Apprentice is a profiling
tool which uses source code instrumentation through compiler switches and
provides statistics on the level of functions and basic blocks. PAT,
the Performance Analysis Tool, is actually several tools in one.
It provides profiling through sampling and access to hardware performance
information. It also includes an object code instrumenter which can be used
for detailed call site profiling and gathering of function level hardware
performance statistics. In a collaboration
between Silicon Graphics/Cray Research and
Forschungszentrum Jülich,
PAT was extended to also support event tracing.
Additional information and links:
- A technical report describes how the new
extended PAT
and VAMPIR, an
event trace browser developed by Forschungszentrum Jülich,
ZAM and Technical
University Dresden,
ZHR, (now
commercially distributed by PALLAS)
can be used to analyze message passing programs on the CRAY T3E. The
powerful trace browsing features of VAMPIR make it a perfect extension
to PAT's object instrumentation and tracing functionality.
- Step-by-step instructions are also available.
- The necessary additional tools (e.g., pif2bpv)
and files not distributed with CRAY Unicos/mk
are available here for PrgEnv 3.4
and PrgEnv 3.5 (gzipped tar files).
- The slides (PostScript, in German) used
in our CRAY T3E training courses give a nice overview on PAT.
- Slides of my talk "Performance
Analysis on the Cray T3E with Apprentice, PAT, and VAMPIR"
- Vampir
Tutorial from Pallas (PostScript document, source and trace file examples)
VAMPIRtrace
The VAMPIRtrace
profiling tool for MPI applications produces tracefiles
that can be analyzed with
the VAMPIR
performance-analysis tool.
VAMPIRtrace is a commerical product produced and distributed by PALLAS.
In addition to recording all calls to the MPI library and all transmitted
messages, VAMPIRtrace allows to define and record arbitrary user-defined
events. Instrumentation can be switched on or off during runtime, and a
powerful filtering mechanism helps to limit the amount of the generated
trace data.
VAMPIRtrace is an add-on for existing MPI implementations; using it merely
requires relinking the application with the VAMPIRtrace profiling library.
This will enable the tracing of all calls to MPI routines, as well as of
all explicit message-passing.
To define and trace user-defined events, or to use the profiling control
functions, calls to the VAMPIRtrace API must be inserted into the
application's source code. This makes recompilation of all affected
source modules necessary, of course.
A special "dummy" version of the profiling libraries that contains empty
definitions for all VAMPIRtrace API routines can be used to "switch off"
tracing without removing the VAMPIRtrace API calls first and then
recompiling.
In order to use VAMPIRtrace on the local FZJ/NIC CRAY T3E's, the environment
variable PAL_ROOT needs to be set to /usr/local/vt, e.g.,
ksh: export PAL_ROOT=/usr/local/vt
tcsh: setenv PAL_ROOT /usr/local/vt
For more information see the
VAMPIRtrace Installation and
User's Guide (PDF).
Also, VAMPIRtrace user who program in C++ might find the VT++.h handy which allows to use the VAMPIRtrace
API routines much more easily.
TOPAS - Automatic Performance Statistics Collection on the CRAY T3E
TOPAS (T3E Observative Performance Analysis System) is a tool to
automatically and transparently monitor usage and performance
of every parallel job executed on a CRAY T3E. We modified the
UNICOS/mk compiler wrapper scripts to automatically link the
TOPAS measurement module to every user application whenever it is
recompiled. No modification is necessary in the user's program
or build procedures. At run-time, two PEs of the parallel application
are picked to actually perform the measurement for the parallel job
as a whole. The measurement consists of executing special code immediately
before and after the execution of the program. So there is no measurement
overhead during the execution of the application itself. The TOPAS module
is very simple (about 250 lines of code). It is based on the
Performance
Counter Library (PCL), a common interface for portable performance counting
on microprocessors, also developed at NIC/ZAM.
Through environment
variables, users can request the printing of the recorded information at
the end of the execution, choose to measure integer, load, or store
operations instead of floating point, and specify the PEs which should be
used for performing the measurement.
The following information is recorded for every parallel job:
- date and time
- user name
- name of the executable
- number of the PE which took the measurement
- number of PEs used
- MHz of the CPU
- execution mode: batch or interactive
- programming language used (C, C++, KCC, f77, f90)
- message passing library used (MPI, PVM, shmem, or none)
- user, system and wall clock time
- number of floating point operations
- number of integer operations
- level 1 data cache misses
In addition to the TOPAS measurement module, we implemented a tool which
allows a system administrator to calculate interesting statistics like
the typical mflop rates achieved by user programs, as well as programming
language and message passing library usage from this data. Most of this
information is not available through regular T3E system accounting.
Additional information and links:
For further information please contact
Forschungszentrum Jülich, ZAM,
B.Mohr@fz-juelich.de
27-Oct-1999
URL:<http://www.fz-juelich.de/zam/RD/coop/cray/crayperf.html>