Bernd Mohr b.mohr@fz-juelich.de Oct 1, 1998 Tracing Programs on the Cray T3E with PAT ========================================= This describes PAT Version 2.0.0.0 --------------------------------------------------------------------------- Part A: Basic Instructions --------------------------------------------------------------------------- The following text is a step by step description on how to generate event traces of an application which can be analyzed by VAMPIR (a event trace browser developed by us and available from a company named PALLAS (see http://www.pallas.de/). The instrumentation and tracing is done with the Cray tool PAT. On the T3Es of Research Centre Juelich, the PAT extensions are stored in /usr/local/choose/tools/pat2. On other computers this may be installed somewhere else. Step 0. Please make sure that you have the latest version by calling pat -V This should print "PAT: version 2.0.0.0". If you still get the old version ("PAT: version 1.0.0.#"), you need to use the module command to switch to the lastest craytools version: module switch craytools craytools.3.1.0.0 Step 1. Re-link your application with the PAT library and the necessary wrappers. This is done by adding -lpat, -lwrapper, and pat.cld at the end of your link command. E.g., cc main.o sub.o -o myprog -lmpi -lpat -lwrapper pat.cld f90 main.o sub.o -o myprog -lmpi -lpat -lwrapper pat.cld Step 2. Write all functions to be instrumented in a file (e.g. MYLIST) one per line. IMPORTANT: it is necessary that the name of the function is spelled in exactly the same way as the compiler presents the functions to the linker. For Cray Fortran that means that the function name is all UPPER-CASE. Fortran local or module functions are mangled the following way: LocalFuncName_IN_GlobalFuncName FuncName_IN_ModuleName C++ users unfortunately need to use the C++ mangled names. To ease the instrumentation of message-passing code (MPI, PVM, or SHMEM), the directory /usr/local/choose/tools/pat2/ contains some predefined lists containing the names of all functions defined for a message passing library (*_LIST). There are two versions of each list, one containing all functions (*_ALL_LIST), one containing only the functions for which special wrappers exist (*_HOOK_LIST). Start by copying the required list, then add the names of your own functions and remove the names of the message passing functions you don't want to be traced. IMPORTANT: Do NOT change the spelling of the functions in the *_LIST files (i.e. do NOT change to uppercase for Fortran) IMPORTANT: When tracing MPI functions, the function MPI_Init must always be included. IMPORTANT: Many SHMEM functions are just "aliases" for another SHMEM function. Any SHMEM point-to-point function NOT mentioned in the SHMEM_HOOK_LIST is an alias and should not be added to the list. However, removing functions is fine. INPORTANT: When tracing PVM and pvm_recv, pvm_nrecv, or pvm_trecv is used, it is also necessary to instrument the pvm_upk* functions. Step 3. Instrument your program. Call PAT with the options -L -B followed by the function list file, the name of the executable: pat -L -B MYLIST This will generate a new, instrumented executable named "a.out". If "a.out" was the name of the "input" executable, PAT will use "b.out". An already existing "a.out" (or "b.out" respectively) will be overwritten. For small testcases, step 2 can be skipped and instrumentation can also be done in an interactive mode. To do this, call PAT with the option -L and the name of the executable: pat -L At the PAT prompt (=>) you can instruct PAT to instrument a function "foo" with the "insttrace" command: => insttrace foo It is possible to list more than one function after the btrace command as well as to issue multiple insttrace commands. After instrumenting you can leave PAT by using the "quit" command. Step 4. Run the instrumented program. This generates a pif file containing the trace. The event trace is already merged and sorted. The name of the pif file will be the name of the executable with the suffix ".pif" added. If this file exists already, an additional number is inserted after the name of the executable, e.g., a.out.pif a.out.0.pif a.out.1.pif ... By default, the PAT tracing runtime system uses a buffer of 8192 events per PE. It also continues tracing after the buffer is flushed. This behavior can be chnaged through the environment variables PAT_NUM_TRACE_ENTRIES and PAT_TRACE_LIMIT. For example, to get the same behavior as the VAMPIR tracing system use: sh,ksh: export PAT_NUM_TRACE_ENTRIES=20000 export PAT_TRACE_LIMIT=20000 csh,tcsh: setenv PAT_NUM_TRACE_ENTRIES 20000 setenv PAT_TRACE_LIMIT 20000 Also, by default, the wrapper for collective communication do not record message traffic because it is not known how the MPI, PVM, and SHMEM collective communication routines implement it. But by setting the environment variable PAT_TRACE_COLLECTIVE to "1", the user can have the "logical" pattern of the communication recorded. sh,ksh: export PAT_TRACE_COLLECTIVE=1 csh,tcsh: setenv PAT_TRACE_COLLECTIVE 1 Step 5. Convert the trace into VAMPIR format using the command PIF2BPV. Call PIF2BPV with the name of the pif file generated in step 4 and the name for the VAMPIR trace file. If no output filename is specified, the generated trace will be named like the pif file with the prefix ".bpv" added. /usr/local/choose/tools/pat2/pif2bpv a.out.pif mytrace.bpv For now, you can use "pat -t piffile" to look at the generated trace or write your own trace file converter for your trace browser using the libpif.a routines (see below). --------------------------------------------------------------------------- Part B: Advanced Tracing --------------------------------------------------------------------------- Selective Tracing ----------------- It is possible to switch on and off tracing for specific parts of your application. For C or C++ programs you can use the functions TRACE_ON() and TRACE_OFF() after including the header file "pat.h" Fortran users can call functions with the name PIF$TRACEON() and PIF$TRACEOFF() defined in header file "pat.fh" Write your own Wrappers ----------------------- In order to use your own trace hooks for a functions contained in system or 3d party libraries, you have to write two functions: they must have exactly the same function signature (the same return type and number of types of arguments) as the original function, but are named origFuncName_trace_entry and origFuncName_trace_exit When instrumenting function "origFuncName", PAT now inserts calls to origFuncName_trace_entry and origFuncName_trace_exit, and passes them the same argument list as the original function call (instead of calling its own generic hook functions). As an example, we want to measure the amount of data written in a C or C++ application. This can easily done by writing wrappers for the functions "main" and "write": static int fdtable[OPEN_MAX]; size_t write_trace_entry (int fildes, const void *buf, size_t nbyte) { fdtable[fildes] += nbyte; } size_t write_trace_exit (int fildes, const void *buf, size_t nbyte) {} int main_trace_entry () { int j; for (j=0; j