GTBW - Methods and Tools, Software Support
Goals
Developing and running applications on a Metacomputer requires a broad
range of new tools or at least modifications to existing ones. These
include metacomputing-aware communication libraries and tools for
debugging, performance-analysis and tuning of software in a
distributed environment. The aim of this subproject is to provide such
libraries and tools for the metacomputing applications in the Gigabit
Testbed West and to support the corresponding projects in using them.
Communication Libraries
A natural programming model for a Metacomputer with its
geographically distributed memory is of course message passing. Within
massively parallel supercomputers (MPP) message passing is also a
widely used model. Here MPI
has become the standard
API. Almost all MPP-vendors deliver their systems with highly tuned
MPI libraries. Anyway, those libraries usually only provide
communication within the MPP.
Better support for metacomputing is included in the new MPI-2
standard, which has been released in August 1997. An example is the
ability to establish communication between two independently started
MPI-2 applications. The drawback is that on the one hand the question
of interoperability is still not treated in the standard and that on
the other hand no implementations of MPI-2 (besides the parallel I/O)
are available for the supercomputers used in the project.
Several MPI implementations address some of the issues mentioned above:
- MPICH-G is an MPI-1.2 implementation based on MPICH. It is developed
in the globus project. Communication is layered on top of the nexus
library that allows for communication inside as well as between
MPPs.
- LAM has the ability to set up heterogeneous clusters of workstations
which are connected by a TCP/IP network. It supports large parts of
the MPI-2 standard.
- PACX-MPI
is layered on top of vendor MPI-implementations. It supports a subset
of MPI-1 and offers efficient communication inside and between
MPPs. PACX-MPI is an ongoing project of at the Computer Center of the
University of Stuttgart (RUS)
with growing functionality.
MetaMPI
Since no implementation fully meets the requirements of the Gigabit
Testbed West, the project partners have contracted Pallas GmbH to develop
'MetaMPI'. MetaMPI is an interoperable MPI library with 'multiprotocol
support' for the platforms that are used within the testbed.
Before late 1999, when the first MetaMPI prototype became available,
PACX-MPI 2.0 has been used by the metacomputing applications in the
Gigabit Testbed. The porting of this library to the IBM SP2 as well as
tuning of the socket communication for high-speed, low-latency
networks like the Gigabit Testbed was performed by the project
partners.
MetaMPI Architecture
MetaMPI is able to transparently couple two or more parallel
computers. For an MPI-application that is linked with MetaMPI, these
computers appear as a single machine with a common MPI_COMM_WORLD.
The implementation is based on MPICH 1.1. The ADI and the device codes
of MPICH were modified so that multiple devices can be used
simultaneously. For communication that takes place inside each
machine, the native communication-device is used. Messages that are
exchanged between nodes on different machines are transferred via
TCP/IP socket connections. On each machine so-called router-nodes are
configured, that handle these sockets and that are dedicated to relay
messages. The number of router-nodes and socket-connections can be
configured by the user to match the communication-requirements of the
actual application. The configurations can be read either from
config-files or a central SQL database. For the latter, a GUI-based
configuration editor (MetaConf) is available.
MPI Functionality
MetaMPI supports the following parts of the MPI-1 and MPI-2 standards:
- The complete MPI-1 standard (as supported by MPICH 1.1)
- Parts of chapter 4 'Miscellany' of the MPI-2 standard (Info-object, new Datatypes, Memory allocation, most of the language interoperability functions)
- Chapter 5 'Process Creation and Management' of the MPI-2 standard, including the name services
- Chapter 7 'Extended Collective Operations' of the MPI-2 standard
Supported Platforms
MetaMPI is able to couple any number of the following machines in any
combination:
- Cray T3E
- Cray T90
- IBM SP2
- Sun SMPs (running Solaris)
- SGI SMPs (running IRIX)
Status
The implementation has been finished in September 1999. MetaMPI is
currently used by the metacomputing applications within the Gigabit
Testbed West. It is planned to make it available for other users after
the end of the project in early 2000.
VAMPIR
VAMPIR
is a tool for the graphical analysis of trace files generated by
MPI applications. It was developed by the ZAM/FZJ and is now
distributed by Pallas GmbH. For use within the Gigabit Testbed,
some extensions to VAMPIR have been made. These include a
wrapper library, that allows PACX-MPI applications to generate
VAMPIR-readable tracefiles and the ability to process several groups
of trace files, generated by several MPPs with non-synchronous clocks.
The figure above shows an excerpt of computation and communication of
the application TRACE run on four nodes, two from the CRAY T3E and two
from the IBM SP2, visualized with VAMPIR.
Publications
Forschungszentrum Jülich, ZAM, Th.Eickermann@fz-juelich.de
14-Oct-1999
URL: <http://www.fz-juelich.de/gigabit/gtbw_tools.html>
|
|
|