Total time spent for program execution including the idle times of CPUs
reserved for slave threads during OpenMP sequential execution. This pattern
assumes that every thread of a process allocated a separate CPU during the
entire runtime of the process.
Time spent on program execution but without the idle times of slave threads
during OpenMP sequential execution. Note that for pure MPI applications,
this pattern is equal to Time.
Time spent performing major tasks related to trace generation, such as time
synchronization or dumping the trace-buffer contents to a file. Note that
the normal per-event overhead is not included.
Collective communication operations that send data from all processes to
one destination process (i.e., n-to-1) may suffer from waiting times if the
destination process enters the operation earlier than its sending
counterparts, that is, before any data could have been sent. The pattern
refers to the time lost as a result of this situation. It applies to the
MPI calls MPI_Reduce(), MPI_Gather() and MPI_Gatherv().
Collective communication operations that send data from one source process
to all processes (i.e., 1-to-n) may suffer from waiting times if
destination processes enter the operation earlier than the source process,
that is, before any data could have been sent. The pattern refers to the
time lost as a result of this situation. It applies to the MPI calls
MPI_Bcast(), MPI_Scatter() and MPI_Scatterv().
Collective communication operations that send data from all processes to
all processes (i.e., n-to-n) exhibit an inherent synchronization among all
participants, that is, no process can finish the operation until the last
process has started it. This pattern covers the time spent in n-to-n
operations until all processes have reached it. It applies to the MPI calls
MPI_Reduce_scatter(), MPI_Allgather(), MPI_Allgatherv(), MPI_Allreduce(),
MPI_Alltoall(), MPI_Alltoallv().
A send operation is blocked until the corresponding receive operation is
called. This can happen for several reasons. Either the MPI implementation
is working in synchronous mode by default or the size of the message to be
sent exceeds the available MPI-internal buffer space and the operation is
blocked until the data is transferred to the receiver. The pattern refers
to the time spent waiting as a result of this situation.
A Late Sender situation may be the result of messages that are
received in the wrong order. If a process expects messages from one or more
processes in a certain order, although these processes are sending them in
a different order, the receiver may need to wait for a message if it tries
to receive a message early that has been sent late. The situation can be
avoided by receiving messages in the order in which they are sent instead.
This pattern refers to the time spent in a wait state as a result of this
situation.
This pattern covers the time spent waiting in front of an MPI barrier,
which is the time inside the barrier call until the last processes has
reached the barrier. A large amount of waiting time spent in front of
barriers can be an indication of load imbalance.