Performance properties

Time

Description:
Total time spent for program execution including the idle times of CPUs reserved for slave threads during OpenMP sequential execution. This pattern assumes that every thread of a process allocated a separate CPU during the entire runtime of the process.
Unit:
Seconds
Parent:
None
Children:
Execution, Overhead

Visits

Description:
Number of times a certain call path has been visited.
Unit:
Counts
Parent:
None
Children:
None

Execution

Description:
Time spent on program execution but without the idle times of slave threads during OpenMP sequential execution. Note that for pure MPI applications, this pattern is equal to Time.
Unit:
Seconds
Parent:
Time
Children:
MPI

Overhead

Description:
Time spent performing major tasks related to trace generation, such as time synchronization or dumping the trace-buffer contents to a file. Note that the normal per-event overhead is not included.
Unit:
Seconds
Parent:
Time
Children:
None

MPI

Description:
This pattern refers to the time spent in MPI calls.
Unit:
Seconds
Parent:
Execution
Children:
Communication, MPI I/O, Init/Exit, Synchronization

Communication

Description:
This pattern refers to the time spent in MPI communication calls.
Unit:
Seconds
Parent:
MPI
Children:
Collective, Point-to-point

MPI I/O

Description:
This pattern refers to the time spent in MPI I/O calls.
Unit:
Seconds
Parent:
MPI
Children:
None

Init/Exit

Description:
This pattern refers to the time spent in MPI initialization calls. It applies to MPI_Init() and MPI_Finalize() calls.
Unit:
Seconds
Parent:
MPI
Children:
None

Synchronization

Description:
This pattern refers to the time spent in MPI barriers.
Unit:
Seconds
Parent:
MPI
Children:
Barrier

Collective

Description:
Time spent for MPI collective communication.
Unit:
Seconds
Parent:
Communication
Children:
Early Reduce, Late Broadcast, Wait at N x N

Early Reduce

Description:
Collective communication operations that send data from all processes to one destination process (i.e., n-to-1) may suffer from waiting times if the destination process enters the operation earlier than its sending counterparts, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to the MPI calls MPI_Reduce(), MPI_Gather() and MPI_Gatherv().


Early Reduce Example

Unit:
Seconds
Parent:
Collective
Children:
None

Late Broadcast

Description:
Collective communication operations that send data from one source process to all processes (i.e., 1-to-n) may suffer from waiting times if destination processes enter the operation earlier than the source process, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to the MPI calls MPI_Bcast(), MPI_Scatter() and MPI_Scatterv().


Late Broadcast Example

Unit:
Seconds
Parent:
Collective
Children:
None

Wait at N x N

Description:
Collective communication operations that send data from all processes to all processes (i.e., n-to-n) exhibit an inherent synchronization among all participants, that is, no process can finish the operation until the last process has started it. This pattern covers the time spent in n-to-n operations until all processes have reached it. It applies to the MPI calls MPI_Reduce_scatter(), MPI_Allgather(), MPI_Allgatherv(), MPI_Allreduce(), MPI_Alltoall(), MPI_Alltoallv().


Wait at N x N Example

Unit:
Seconds
Parent:
Collective
Children:
None

Point-to-point

Description:
This pattern refers to the time spent in MPI point-to-point communication calls.
Unit:
Seconds
Parent:
Communication
Children:
Late Receiver, Late Sender

Late Receiver

Description:
A send operation is blocked until the corresponding receive operation is called. This can happen for several reasons. Either the MPI implementation is working in synchronous mode by default or the size of the message to be sent exceeds the available MPI-internal buffer space and the operation is blocked until the data is transferred to the receiver. The pattern refers to the time spent waiting as a result of this situation.


Late Receiver Example

Unit:
Seconds
Parent:
Point-to-point
Children:
None

Late Sender

Description:
The time lost waiting caused by a blocking receive operation (e.g, MPI_Recv or MPI_Wait) that is posted earlier than the corresponding send operation.


Late Sender Example

Unit:
Seconds
Parent:
Point-to-point
Children:
Messages in Wrong Order (Late Sender)

Messages in Wrong Order (Late Sender)

Description:
A Late Sender situation may be the result of messages that are received in the wrong order. If a process expects messages from one or more processes in a certain order, although these processes are sending them in a different order, the receiver may need to wait for a message if it tries to receive a message early that has been sent late. The situation can be avoided by receiving messages in the order in which they are sent instead. This pattern refers to the time spent in a wait state as a result of this situation.


Messages in Wrong Order (Late Sender) Example

Unit:
Seconds
Parent:
Late Sender
Children:
None

Barrier

Description:
This pattern refers to the time spent in MPI barriers.
Unit:
Seconds
Parent:
Synchronization
Children:
Barrier Completion, Wait at Barrier

Barrier Completion

Description:
This pattern refers to the time spent in MPI barriers after the first process has left the operation.


Barrier Completion Example

Unit:
Seconds
Parent:
Barrier
Children:
None

Wait at Barrier

Description:
This pattern covers the time spent waiting in front of an MPI barrier, which is the time inside the barrier call until the last processes has reached the barrier. A large amount of waiting time spent in front of barriers can be an indication of load imbalance.


Wait at Barrier Example

Unit:
Seconds
Parent:
Barrier
Children:
None

SCALASCA    Copyright © 1998-2006 Forschungszentrum Jülich
Copyright © 2003-2006 University of Tennessee