1. Field of the Invention
The present invention generally relates to message passing between multiple processes and more particularly to message passing in a multiple computer processor system.
2. Background Art
A computer system executes applications and programs in order to perform necessary and desired functions. Typically, one computer processor is responsible for executing the applications of the computer system in which the processor resides. However, there is a trend toward using multiple processors for executing applications.
Such multiprocessor systems have a plurality of processors, in close communication with one another, sharing system resources, i.e., memory, peripheral devices, etc. Thus, by using a multiprocessor system, execution throughput can be increased.
A typical architecture used by multiprocessor systems is the symmetric multiprocessing (“SMP”) architecture, in which each of a number of processors in a machine share the memory available on the machine. SMP architecture provides fast performance by making different processors available to a number of processes on the machine simultaneously. In order to gain further performance out of SMP systems, many such systems are connected together and the processes running on each system communicate with each other via the use of some inter-process communication paradigm. A group of SMPs connected together to run an application is called a cluster of SMPs, and each SMP system in the cluster is referred to as an SMP node.
In a cluster of SMPs, different SMP nodes may have different amounts of processes. In order for the cluster of SMP nodes to execute and run a computer application, i.e., a user program, processes running on the processors in the same and/or different SMP nodes communicate with each other.
Several mechanisms have been developed to facilitate transfer of data among processors, and more specifically, between processing node memories. One mechanism for exchanging messages is referred to as “message passing.” Several known message passing specifications have been developed, including Message Passing Interface (“MPI”) and Parallel Virtual Machine (“PVM”). Generally, in message passing, in order to transfer data from one process to another, the transferring process generates a message including the data and transfers the message to another process. For example, when a first process needs to retrieve data from a second process, the first process generates a message, including a data retrieval request, and transfers the message to the second process from which data is to be retrieved. Subsequently, the second process executes the data retrieval request and transfers the data to the first process in a message as described above.
In a computer system constructed according to a distributed memory architecture, messages using the message passing mechanism are transferred between processing nodes over a network and processed or otherwise handled by a respective processing node when they arrive at a respective destination. In a computer system constructed according to a shared memory architecture, several buffer arrangements have been developed to facilitate message transfer.
FIG. 1 shows a typical prior art embodiment of a shared memory architecture message transfer mechanism in which a plurality of processes are provided with a plurality of buffers, each associated with a particular process pair. FIG. 1 shows a first SMP node (20), on which two processes, an ‘i-th’ process (24) and a ‘k-th’ process (26), reside, and a second SMP node (22), on which a ‘j-th’ process (36) resides. When the ‘i-th’ process (24) transfers a message to the ‘j-th’ process (36), the ‘i-th’ process (24) deposits the message into a buffer, B(i,j) (38), that is specifically maintained to store data for the ‘i-th’ process (24) and the ‘j-th’ process (36) when the ‘i-th’ process (24) is a sending process, i.e., a process that sends a message, and the ‘j-th’ process (36) is a receiving process, i.e., a process that receives a message. Similarly, when the ‘j-th’ process (36) transfers a message to the ‘i-th’ process (24), the ‘j-th’ process (36) deposits the message into a buffer, B(j,i) (28), that is specifically maintained to store data for the ‘j-th’ process (36) and the ‘i-th’ process (24) when the ‘j-th’ process (36) is the sending process and the ‘i-th’ process (24) is a receiving process.
Once a message is deposited into a particular buffer, the receiving process can copy the contents of that buffer to its region of memory. For example, once the ‘i-th’ process (24) has deposited a message to B(i,j) (38), the ‘j-th’ process (36) can then retrieve the message by copying it to its region of memory, after which the ‘i-th’ process (24) can again transfer a message to the ‘j-th’ process (36).
The mechanism described above for message transfers between the ‘i-th’ process (24) and the ‘j-th’ process (36) can also be applied to message transfers between the ‘i-th’ process (24) and the ‘k-th’ process (26), using buffers B(i,k) (30) and B(k,i) (32), and between the ‘j-th’ process (36) and the ‘k-th’ process (26), using buffers B(k,j) (40) and B (j,k) (34).
Allocating buffers in the manner described above with reference to FIG. 1 ensures that there is no contention for buffers, e.g., memory space, between processes attempting to transmit messages to the same process. However, since each buffer has to be sufficiently large to be able to accommodate a large message, it is appreciated that a significant portion of memory address space may be required to maintain the plurality of buffers. Further, the portion of memory address space needed to maintain a plurality of buffers increases as the number of processes increase.
In another mechanism, shown in FIG. 2, shared buffers are provided that include a plurality of buffers available to all processes. FIG. 2 shows a first SMP node (also referred to as “SMP Node 1”) (42), on which an ‘i-th’ process (50), a ‘k-th’ process (52), and a first shared buffer pool (54), reside, and a second SMP node (also referred to as “SMP Node 2”) (44), on which a ‘j-th’ process (62) and a second shared buffer pool (66) reside. Typically, buffers in the first and second shared buffer pools (54, 66) have sizes on the order of the sizes of the process pair buffers (28, 30, 32, 34, 38, 40) shown in FIG. 1. In addition to being provided a shared buffer pool, a process is provided with a relatively small buffer (also referred to as “postbox,” “pbx,” or “postbox portion”) to transfer messages to a particular process.
When the ‘i-th’ process (50) transfers a message to the ‘j-th’ process (62), the ‘i-th’ process (50) first attempts to store the message in a postbox, pbx(i,j) (60), that is specifically maintained for message transfers between the ‘i-th’ process (50) and the ‘j-th’ process (62) when the ‘i-th’ process (50) is a sending process and the ‘j-th’ process (62) is a receiving process. If the message fits in pbx(i,j) (60), then the message is stored there for subsequent retrieval by the ‘j-th’ process (62). However, if the message does not fit in pbx(i,j) (60), the ‘i-th’ process (50) allocates a buffer from the ‘j-th’ process's (62) shared buffer pool (66), i.e., the second shared buffer pool (66), loads the message into the allocated buffer, and loads a pointer to the allocated buffer in pbx(i,j) (60) for the ‘j-th’ process (62). Thereafter, the ‘j-th’ process (62) can detect that a message or pointer has been loaded into pbx(i,j) (60) and then retrieve the message or pointer. If pbx(i,j) (60) contains a pointer, the ‘j-th’ process (62) can use the pointer to identify the allocated buffer from its shared buffer pool (66) and then retrieve the message by copying it to its region of memory. After the ‘j-th’ process (62) has retrieved the message from the allocated buffer from its shared buffer pool (66), the allocated buffer, i.e., memory, can then be returned to its shared buffer pool (66). In the case that the ‘j-th’ process (62) transfers a message to the ‘i-th’ process (50), the ‘j-th’ process (62) loads a message or pointer into a postbox, pbx(j,i) (46), that is specifically maintained for message transfers between the ‘i-th’ process (50) and the ‘j-th’ process (62) when the ‘j-th’ process (62) is the sending process and the ‘i-th’ process (50) is the receiving process.
The mechanism described above with reference to FIG. 2 for message transfers between the ‘i-th’ process (50) and the ‘j-th’ process (62) can also be applied to message transfers between the ‘i-th’ process (50) and the ‘k-th’ process (52), using postboxes pbx(k,i) (48) and pbx(i,k) (56), and between the ‘j-th’ process (62) and the ‘k-th’ process (52), using postboxes pbx(k,j) (64) and pbx (j,k) (58).
The message transfer mechanism described with reference to FIG. 2 reduces the amount of memory space that is needed relative to the amount of memory space that is needed in the message transfer mechanism described in FIG. 1, because the postboxes (46, 48, 56, 58, 60, 64) shown in FIG. 2 require less space than the buffers (28, 30, 32, 34, 38, 40) shown in FIG. 1. Moreover, the amount of shared buffer pools increases linearly with the amount of processes. However, some contention for memory space is possible with the message transfer mechanism described in FIG. 2. Typically, a lock/unlock mechanism is provided to synchronize access to shared buffer pools to ensure that a current sending process does not deposit a message into a particular buffer in the shared buffer pool in the case that a receiving process has not retrieved a message from a prior sending process that used the same buffer in the shared buffer pool that the current sending process is attempting to send a message to. However, lock/unlock mechanisms, which are used to stop subsequent processes from altering memory before a receiving process can copy that memory, can become a performance bottleneck when a large number of processes transfer messages simultaneously.
FIG. 3 shows another prior art mechanism used to handle message transfer. This mechanism uses a sender owned buffer that is managed by each sending process. FIG. 3 depicts nodes 1 . . . n (70, 72, 74, 76), where n represents the number of nodes in a particular system. In order to depict the mechanism shown in FIG. 3 more clearly, processes on nodes 2 . . . n (72, 74, 76) transfer messages to processes on node 1 (70). However, those skilled in the art will appreciate that processes on node 1 (70) can transfer messages to processes on nodes 2 . . . n (72, 74, 76).
Nodes 1 . . . n (70, 72, 74, 76) each have processes 1 . . . m, where m represents the number of processes on a particular node. Note that m can be different for different nodes. Each of the processes on nodes 1 . . . n (70, 72, 74, 76) is allocated a region of common memory, which it uses in its processing operations. In the case of multiple nodes as shown in FIG. 3, each process is allocated a piece of memory on each node including the node on which the process itself resides. With reference to FIG. 3, the allocated memory regions (78, 80, 82) for processes 1 and 2 on node 2 (72) and process m on node n (76) are shown, respectively. Other processes shown on nodes 1 . . . n (70, 72, 74, 76) also have allocated memory regions which are not shown in FIG. 3.
The allocated memory regions (78, 80, 82) are each divided into p postbox blocks and a buffer pool having q buffer blocks, where p represents the number of processes on the node to which it is sending a message and where q represents the number of buffer blocks in a particular buffer pool.
If process 1 on node 2 (72) needs to send a message to process 2 on node 1 (70), process 1 on node 2 (72) first attempts to load the message into a postbox in its allocated memory region (78) that is specifically maintained for transfers between process 1 on node 2 (72) and process 2 on node 1 (70) when process 1 on node 2 (72) is the sending process and process 2 on node 1 (70) is the receiving process. If the message is successfully loaded into the postbox, then process 2 on node 1 (70) can thereafter retrieve the message by copying it from the postbox to its region of common memory.
However, if the message is not small enough to fit in the postbox, then process 1 on node 2 (72) selects one of the buffer blocks from its allocated memory region (78), loads the message into that selected buffer, and stores a pointer to the selected buffer into the postbox that is specifically maintained for transfers between process 1 on node 2 (72) and process 2 on node 1 (70) when process 1 on node 2 (72) is the sending process and process 2 on node 1 (70) is the receiving process. Thereafter, process 2 on node 1 (70), the receiving process, can retrieve the message by first retrieving the pointer from the postbox described above, and then using the pointer stored in the postbox by process 1 on node 2 (72), the sending process, to identify the buffer block into which the message was loaded, and then copying the message from the buffer block into its common region of memory. Afterwards, the receiving process notifies the sending process that it can reuse the postbox for a future message.
Note that the mechanism described in FIG. 3 can also be applied for message transfers between any of the process shown in FIG. 3.
Moreover, with reference to FIG. 3, the number of allocated memory regions that need to be maintained by a system is known ahead of time since each sending process owns one allocated memory region. Also, since each allocated memory region is owned and maintained by one process, a lock/unlock mechanism is not needed.