The invention relates generally to the field of digital computer systems, and more particularly to mechanisms for facilitating transfer of information between and among a plurality of processes.
Computers typically execute programs in one or more processes or threads (generally xe2x80x9cprocessesxe2x80x9d) on one or more processors. If a program comprises a number of cooperating processes which can be processed in parallel on a plurality of processors, sometimes groups of those processes need to communicate to cooperatively solve a particular problem. Two basic architectures have been for multi-processor computer systems, namely, distributed memory systems and shared memory systems. In a computer system constructed according to the distributed memory architecture, processors and memory are allocated to processing nodes, with each processing node typically having a processor and an associated xe2x80x9cnode memoryxe2x80x9d portion of the system memory. The processing nodes are typically interconnected by a fast network to facilitate transfer of data from one processing node to another when needed for, for example, processing operations performed by the other processing node. Typically in a computer constructed according to the distributed memory architecture, a processor is able to access data stored in its node memory faster that it would be able to access data stored in node memories on other processing nodes. However, contention for the node memory on each processing node is reduced since there is only one processor, that is, the processor on the processing node, which accesses the node memory for its processing operations, and perhaps a network interface which can access the node memory to store therein data which it received from another processing node, or to retrieve data therefrom for transfer to another processing node.
Typically, in a computer system constructed according to the shared memory architecture, the processors share a common memory, with each processor being able to access the entire memory in a uniform manner. This obviates the need for a network to transfer data, as is used in a computer system constructed according to the distributed memory architecture; however, contention for the shared memory can be greater than in a computer system constructed according to the distributed memory architecture. To reduce contention, each processor can be allocated a region of the shared memory which it uses for most of its processing operations. Although each processor""s region is accessible to the other processors so that they (that is, the other processors) can transfer data thereto for use in processing by the processor associated with the respective region, typically most accesses of a region will be by the processor associated with the region.
A computer system can be constructed according to a combination of the distributed and shared memory architectures. Such a computer system comprises a plurality of processing nodes interconnected by a network, as in a computer system constructed according to the distributed memory architecture. However, each processing node can have a plurality of processors which share the memory on the respective node, in a manner similar to a computer constructed according to the shared memory architecture.
Several mechanisms have been developed to facilitate transfer of data among processors, or more specifically, between processing node memories, in the case of a computer system constructed according to the distributed memory architecture, and/or memory regions, in the case of a computer system constructed according to the shared memory architectures. In one popular mechanism, termed xe2x80x9cmessage passing,xe2x80x9d processors transfer information by passing messages therearnong. Several well-known message passing specifications have been developed, including MPI and PVM. Generally, in message passing, to transfer data from one processor to another, the transferring processor generates a message including the data and transfers the message to the other processor. On the other hand, when one processor wishes to retrieve data from another processor, the retrieving processor generates a message including a retrieval request and transfers the message to the processor from which the data is to be retrieved; thereafter, the processor which receives the message executes the retrieval request and transfers the data to the requesting processor in a message as described above.
In a computer system constructed according to the distributed memory architecture, the messages using the message passing mechanism are transferred between processing nodes over the network and processed or otherwise handled by the respective processing node when they arrive at the respective destination. In a computer system constructed according to the shared memory architecture, several buffer arrangements have been developed to facilitate message transfer. In one arrangement, each process is provided with a plurality of buffers, each associated with one of the other processes. When an xe2x80x9ci-thxe2x80x9d process wishes to transfer a message to another xe2x80x9cj-thxe2x80x9d process, it (that is, the xe2x80x9ci-thxe2x80x9d process) deposits the message in a buffer B(i,j) that is maintained therefor. Similarly, when the xe2x80x9cj-thxe2x80x9d process wishes to transfer a message to the xe2x80x9ci-thxe2x80x9d process, it (that is, the xe2x80x9cj-thxe2x80x9d process), will deposit the message in another buffer B(j,i) maintained therefor. Thereafter, the xe2x80x9cj-thxe2x80x9d process can retrieve the message by copying it to its region of memory, after which the xe2x80x9ci-thxe2x80x9d process can again transfer a message to the xe2x80x9cj-thxe2x80x9d process. Allocating buffers in this manner ensures that there will be no contention for buffers as among processes attempting to transmit messages to the same process; thus, after the xe2x80x9ci-thxe2x80x9d process has deposited a message for the xe2x80x9cj-thxe2x80x9d process in buffer B(i,j), the xe2x80x9ck-thxe2x80x9d process can also transfer a message to the xe2x80x9cj-thxe2x80x9d process by depositing the message in the buffer B(k,j) before the xe2x80x9cj-thxe2x80x9d process has copied the xe2x80x9ci-thxe2x80x9d process""s message from the buffer B(i,j). However, since each buffer is typically relatively large, generally sufficiently large as to be able to accommodate a relatively large message, it will be appreciated that a significant portion of the memory address space may be required for the buffers, and further that the portion will increase, with increasing numbers of processes, on the order of N2, where xe2x80x9cNxe2x80x9d is the number of processes.
In another mechanism, instead of providing relatively large buffers B(i,j) for the respective processes, a buffer pool is provided comprising a plurality of buffers available to all of the processes, with the buffers B(x) in the pool having sizes on the order of the sizes of the buffers B(i,j). In addition, each process is provided with a relatively small buffer, referred to as a postbox P. When the xe2x80x9ci-thxe2x80x9d process wishes to transfer a message to the xe2x80x9cj-thxe2x80x9d process, if the message will fit into the postbox, it (that is, the xe2x80x9ci-thxe2x80x9d process) will store the message in its postbox P(i,j) for the xe2x80x9cj-thxe2x80x9d process. On the other hand, if the message will not fit into the postbox, the xe2x80x9ci-thxe2x80x9d process will allocate a buffer B(x) from the pool, load the message into the buffer B(x) and load a pointer to the buffer B(x) in its postbox P(i,j) for the xe2x80x9cj-thxe2x80x9d process. Thereafter, the xe2x80x9cj-thxe2x80x9d process can detect that a message or pointer has been loaded into its postbox P(i,j) and retrieve it (that is, the message or pointer) therefrom. If the postbox P(i,j) contains a pointer, the xe2x80x9cj-thxe2x80x9d process can use the pointer to identify the buffer B(x) which contains the message and retrieve it (that is, the message) therefrom by copying it to its region of memory. After the xe2x80x9cj-thxe2x80x9d process has retrieved the message from the buffer B(x), it can return the buffer to the buffer pool. This mechanism provides the advantage that it reduces the amount of memory space which is needed to be provided from that described above, since the postboxes P(i,j) require far less space than the buffers B(i,j), and the number of buffers B(x) can be bounded, with the number being fixed, growing with xe2x80x9cN,xe2x80x9d the number of processes perhaps linearly, or the like. However, some contention for buffers is possible with this mechanism. In addition, some mechanism needs to be provided to synchronize access to the buffers, to ensure that, after the xe2x80x9ci-thxe2x80x9d process has deposited a message for the xe2x80x9cj-thxe2x80x9d process in a buffer B(x), another xe2x80x9ck-thxe2x80x9d process does not deposit a message in the same buffer B(x) before the xe2x80x9cj-thxe2x80x9d process has retrieved the message. Typically, such lock/unlock mechanisms can become a bottleneck, particularly if a large number of processes wish to send messages contemporaneously.
The invention provides a new and improved system and method for allocating buffers for message passing in a shared-memory computer system, thereby to facilitate transfer of messages among processes which share the computer system""s memory.
In brief summary, the invention provides a communication arrangement that facilitates transfer of messages among a plurality of processes in with a computer, the computer having a memory shared by the processes. The communication arrangement comprises, allocated to each process, a plurality of buffers, and a plurality of postboxes each associated with one of the other processes. Each process includes a message size determination module and a message transfer module. The message size determination module is configured to determine whether a message to be transferred to another process can be accommodated by a postbox. The message transfer module is configured to (i) in response to a positive determination by the message size determination module, store the message in the postbox associated with the process as allocated to the other process, and (iii) in response to a negative determination by the message size determination module, store the message in one of the buffers allocated thereto, and providing a pointer to the one of the buffers in the postbox associated with the process as allocated to the other process.