Many computational problems can be subdivided into independent or loosely-dependent tasks, which can be distributed among a group of processors or systems and executed in parallel. This technique often permits the main problem to be solved faster than would be possible if all the tasks were performed by a single processor or system. Sometimes, the processing time can be reduced proportionally to the number of processors or systems working on the sub-tasks. Each process can compute independently except when it needs to exchange data with another task.
Cooperating processors and systems can be coordinated as necessary by transmitting messages between them. Messages can also be used to distribute work and to collect results. Some partitions or decompositions of problems can place significant demands on a message passing infrastructure, either by sending and receiving a large number of messages, or by transferring large amounts of data within the messages.
Messages may be transferred from process to process over a number of different communication channels, or “fabrics.” For example, processes executing on the same physical machine may be able to communicate efficiently using shared memory or point-to-point processor interconnections on multi-processor machines. Processes on different machines may communicate through a high-speed network such as InfiniBand® (a registered trademark of the InfiniBand Trade Association), Myrinet® (a registered trademark of Myricom, Inc. of Arcadia, Calif.), Scalable Coherent Interface (“SCI”), or QSNet by Quadrics, Ltd. of Bristol, United Kingdom. These networks may provide a native operational mode that exposes all of the features available from the fabric, as well as an emulation mode that permits the network to be used with legacy software. Processes may also communicate via traditional networks such as Ethernet.
A standard set of message passing functions may be defined, and libraries provided to perform the standard functions over each type of fabric. The Message Passing Interface (“MPI”) is an industry standard defining the basic application programming interface (API) for programming distributed memory and shared memory systems in terms of message passing. The MPI standard was defined by the members of the MPI Forum (see MPI: A Message-Passing Interface Standard Version 2.1, Message Passing Interface Forum, Jun. 23, 2008, available at xwwwx.mpi-forum.org/docs/, where “www” is replaced with “xwwwx” in the URL to avoid an active link from within this document). An MPI (or similar) library may provide the standard functions over one or more fabrics.
Multiple threads within a process are sometimes used to share resources such as memory, having an advantage that the threads do not require the use of message-passing mechanisms to communicate. Threads are especially useful in taking advantage of the different processor cores in multiprocessor systems. Operating systems in multiprocessor systems can allocate tasks among threads running on the different processor cores and take advantage of the data sharing that is possible for threads running within a common address space and with the processor interconnections available within the multiprocessor environment.
Within an MPI environment, however, multiple threads within one process are required to follow special implementation techniques. Under the MPI standard, each MPI process is typically mapped to a unique operating system process. A process's address space can only be accessed by another process by calling MPI library functions. As pointed out in the MPI-2 specification, section 12.4 “MPI and Threads,” each thread within a process can issue MPI calls; however, threads are not separately addressable because the parameters in a send or receive call identify a process, not a thread. A message sent to a process can be received by any thread in this process. The fact that a process is multi-threaded does not affect the external interface of the process.
To comply with the MPI standard, as stated in the MPI-2 specification, section 12.4 “MPI and Threads,” a thread-compliant implementation must ensure that all MPI calls are thread-safe and that blocking MPI calls block the calling thread only, allowing other threads to execute, if available. However, to meet this standard requires static and global variables used by threads making MPI calls to be protected using, for example, mutual exclusion primitives that allow access by only one thread at a time. A thread-compliant implementation typically requires a source code program to be rewritten by a programmer using techniques such as Posix threads or in accordance with a mixed MPI/OpenMP standard. These sophisticated programming paradigms increase program complexity and may decrease overall program performance. Such an implementation would eliminate one of the advantages of using threads, which can use static and global variables to communicate without the overhead of sending messages through a shared memory mechanism.