GTBW - Methods and Tools, Software Support

Partners:
  • Forschungszentrum Jülich GmbH,
        Central Institute for Applied Mathematics
  • GMD - National Research Center for Information Technology Sankt Augustin GmbH,
       Institute for Algorithms and Scientific Computing
  • Contact Person: Dr. Thomas Eickermann

    *  Goals

    Developing and running applications on a Metacomputer requires a broad range of new tools or at least modifications to existing ones. These include metacomputing-aware communication libraries and tools for debugging, performance-analysis and tuning of software in a distributed environment. The aim of this subproject is to provide such libraries and tools for the metacomputing applications in the Gigabit Testbed West and to support the corresponding projects in using them.

    *  Communication Libraries

    A natural programming model for a Metacomputer with its geographically distributed memory is of course message passing. Within massively parallel supercomputers (MPP) message passing is also a widely used model. Here MPI has become the standard API. Almost all MPP-vendors deliver their systems with highly tuned MPI libraries. Anyway, those libraries usually only provide communication within the MPP.

    Better support for metacomputing is included in the new MPI-2 standard, which has been released in August 1997. An example is the ability to establish communication between two independently started MPI-2 applications. The drawback is that on the one hand the question of interoperability is still not treated in the standard and that on the other hand no implementations of MPI-2 (besides the parallel I/O) are available for the supercomputers used in the project.

    Several MPI implementations address some of the issues mentioned above:

    *  MetaMPI

    Since no implementation fully meets the requirements of the Gigabit Testbed West, the project partners have contracted Pallas GmbH to develop 'MetaMPI'. MetaMPI is an interoperable MPI library with 'multiprotocol support' for the platforms that are used within the testbed.

    Before late 1999, when the first MetaMPI prototype became available, PACX-MPI 2.0 has been used by the metacomputing applications in the Gigabit Testbed. The porting of this library to the IBM SP2 as well as tuning of the socket communication for high-speed, low-latency networks like the Gigabit Testbed was performed by the project partners.

    MetaMPI Architecture

    MetaMPI is able to transparently couple two or more parallel computers. For an MPI-application that is linked with MetaMPI, these computers appear as a single machine with a common MPI_COMM_WORLD. The implementation is based on MPICH 1.1. The ADI and the device codes of MPICH were modified so that multiple devices can be used simultaneously. For communication that takes place inside each machine, the native communication-device is used. Messages that are exchanged between nodes on different machines are transferred via TCP/IP socket connections. On each machine so-called router-nodes are configured, that handle these sockets and that are dedicated to relay messages. The number of router-nodes and socket-connections can be configured by the user to match the communication-requirements of the actual application. The configurations can be read either from config-files or a central SQL database. For the latter, a GUI-based configuration editor (MetaConf) is available.

    MPI Functionality

    MetaMPI supports the following parts of the MPI-1 and MPI-2 standards:

    Supported Platforms

    MetaMPI is able to couple any number of the following machines in any combination:

    Status

    The implementation has been finished in September 1999. MetaMPI is currently used by the metacomputing applications within the Gigabit Testbed West. It is planned to make it available for other users after the end of the project in early 2000.

    *  VAMPIR

    VAMPIR is a tool for the graphical analysis of trace files generated by MPI applications. It was developed by the ZAM/FZJ and is now distributed by Pallas GmbH. For use within the Gigabit Testbed, some extensions to VAMPIR have been made. These include a wrapper library, that allows PACX-MPI applications to generate VAMPIR-readable tracefiles and the ability to process several groups of trace files, generated by several MPPs with non-synchronous clocks.

    The figure above shows an excerpt of computation and communication of the application TRACE run on four nodes, two from the CRAY T3E and two from the IBM SP2, visualized with VAMPIR.

    *  Publications


    KFA-Homepage         ZAM-Homepage         Keyword Search

    Forschungszentrum Jülich, ZAM, Th.Eickermann@fz-juelich.de
    14-Oct-1999
    URL: <http://www.fz-juelich.de/gigabit/gtbw_tools.html>