Many computational problems can be subdivided into independent or loosely-dependent tasks, which can be distributed among a group of processors or systems and executed in parallel. This technique often permits the main problem to be solved faster than would be possible if all the tasks were performed by a single processor or system. Sometimes, the processing time can be reduced proportionally to the number of processors or systems working on the sub-tasks.
Cooperating processors and systems can be coordinated as necessary by transmitting messages between them. Messages can also be used to distribute work and to collect results. Some partitions or decompositions of problems can place significant demands on a message passing infrastructure, either by sending and receiving a large number of messages, or by transferring large amounts of data within the messages.
Messages may be transferred from process to process over a number of different communication channels, or “fabrics.” For example, processes executing on the same physical machine may be able to communicate efficiently using shared memory or point-to-point processor interconnections on multi-processor machines. Processes on different machines may communicate through a high-speed network such as InfiniBand® (a registered trademark of the InfiniBand Trade Association), Myrinet® (a registered trademark of Myricom, Inc. of Arcadia, Calif.), Scalable Coherent Interface (“SCI”), or QSNet by Quadrics, Ltd. of Bristol, United Kingdom. These networks may provide a native operational mode that exposes all of the features available from the fabric, as well as an emulation mode that permits the network to be used with legacy software. A commonly-provided emulation mode may be a Transmission Control Protocol/Internet Protocol (“TCP/IP”) mode, in which the high-speed network is largely indistinguishable from a traditional network such as Ethernet. Emulation modes may not be able to transmit data as quickly as a native mode. Processes may also communicate via traditional networks such as Ethernet.
A standard set of message passing functions may be defined, and libraries provided to perform the standard functions over each type of fabric. One standard library definition is the Message Passing Interface (“MPI”) from the members of the MPI Forum (see MPI: A Message-Passing Interface Standard Version 2.1, Message Passing Interface Forum, Jun. 23, 2008, available at www.mpi-forum.org#docs#, where “/” is replaced with “#” in the URL to avoid an active link from within this document). An MPI (or similar) library may provide the standard functions over one or more fabrics.