The invention relates generally to the field of digital computer systems and methods, and more particularly to a system and method for allocating and using arrays in computer system, such as a symmetric multi-processor, constructed according to the shared memory architecture.
Computers typically execute programs in one or more processes or threads (generally xe2x80x9cprocessesxe2x80x9d) on one or more processors. If a program comprises a number of cooperating processes which can be processed in parallel on a plurality of processors, sometimes groups of those processes need to communicate to cooperatively solve a particular problem. Two basic architectures have been for multi-processor computer systems, namely, distributed memory systems and shared memory systems. In a computer system constructed according to the distributed memory architecture, processors and memory are allocated to processing nodes, with each processing node typically having a processor and an associated xe2x80x9cnode memoryxe2x80x9d portion of the system memory. The processing nodes are typically interconnected by a fast network to facilitate transfer of data from one processing node to another when needed for, for example, processing operations performed by the other processing node. Typically in a computer constructed according to the distributed memory architecture, a processor is able to access data stored in its node memory faster that it would be able to access data stored in node memories on other processing nodes. However, contention for the node memory on each processing node is reduced since there is only one processor, that is, the processor on the processing node, which accesses the node memory for its processing operations, and perhaps a network interface which can access the node memory to store therein data which it received from another processing node, or to retrieve data therefrom for transfer to another processing node.
Typically, in a computer system constructed according to the shared memory architecture, the processors share a common memory, with each processor being able to access the entire memory in a uniform manner. This obviates the need for a network to transfer data, as is used in a computer system constructed according to the distributed memory architecture; however, contention for the shared memory can be greater than in a computer system constructed according to the distributed memory architecture. To reduce contention, each processor can be allocated a region of the shared memory which it uses for most of its processing operations. Although each processor""s region is accessible to the other processors so that they (that is, the other processors) can transfer data thereto for use in processing by the processor associated with the respective region, typically most accesses of a region will be by the processor associated with the region.
A computer system can be constructed according to a combination of the distributed and shared memory architectures. Such a computer system comprises a plurality of processing nodes interconnected by a network, as in a computer system constructed according to the distributed memory architecture. However, each processing node can have a plurality of processors which share the memory on the respective node, in a manner similar to a computer constructed according to the shared memory architecture.
Several mechanisms have been developed to facilitate transfer of data among processors, or more specifically, between processing node memories, in the case of a computer system constructed according to the distributed memory architecture, and/or memory regions, in the case of a computer system constructed according to the shared memory architectures. In one popular mechanism, termed xe2x80x9cmessage passing,xe2x80x9d processors transfer information by passing messages thereamong. Several well-known message passing specifications have been developed, including MPI and PVM. Generally, in message passing, to transfer data from one processor to another, the transferring processor generates a message including the data and transfers the message to the other processor. On the other hand, when one processor wishes to retrieve data from another processor, the retrieving processor generates a message including a retrieval request and transfers the message to the processor from which the data is to be retrieved; thereafter, the processor which receives the message executes the retrieval request and transfers the data to the requesting processor in a message as described above.
A number of programs require processing of data organized in the form of multi-dimensional arrays. In a program processed in parallel by a number of processors in a computer system constructed according to either the shared memory architecture or the distributed memory architecture, portions of an array are distributed among the processors that are processing the program, and the processors process the data in their respective portions. If an n-dimensional array has dimensions m0xm1x . . . xmnxe2x88x921 (mx, x=0,1, . . . ,nxe2x88x921 being an integer), each item of data in the array is indexed by an xe2x80x9cn-tuplexe2x80x9d [i0,i1, . . . ,inxe2x88x921](i0∈0, . . . ,m0xe2x88x921, i1∈0, . . . ,m1xe2x88x921, . . . , inxe2x88x921∈0, . . . ,mnxe2x88x921xe2x88x921). If, for example, the array is distributed among the xe2x80x9cPxe2x80x9d processors according to the last xe2x80x9cm-thxe2x80x9d dimension, then the set of data items associated with a xe2x80x9cfirstxe2x80x9d processor p0 will, if xe2x80x9cmnxe2x88x921xe2x80x9d is divisible by xe2x80x9cP,xe2x80x9d correspond to [i0,i1, . . . ,inxe2x88x921](i0∈0, . . . ,m0xe2x88x921, i1∈0, . . . ,m1xe2x88x921, . . . , inxe2x88x921∈0, . . . ,(mnxe2x88x921/P)xe2x88x921), the set associated with a xe2x80x9csecondxe2x80x9d processors p1 will correspond to [i0,i1, . . . ,inxe2x88x921](i0∈0, . . . ,m0xe2x88x921, i1∈0, . . . ,m1xe2x88x921, . . . , inxe2x88x921∈(mnxe2x88x921/P), . . . ,(2mnxe2x88x921/P)xe2x88x921), and so forth, with generally, the set associated with a xe2x80x9cp-thxe2x80x9d processor Pp will correspond to [i0,i1, . . . ,inxe2x88x921] (i0∈0, . . . ,m0xe2x88x921, i1∈0, . . . ,m1xe2x88x921, . . . , inxe2x88x921∈((p+1)mnxe2x88x921/P), . . . ,mnxe2x88x921xe2x88x921). Decompositions and distributions along other dimensions will be apparent to those skilled in the art.
If the data needs to be redistributed, for example, during a matrix transpose, to form another array, the processors can transmit the data in their respective portions to the memories of the other processors as described above. In the following discussion, it will be assumed that the parallel program is being processed by a computer system constructed in accordance with the shared memory architecture. Generally, in that operation, the transfer is performed using a common pool of shared memory that is accessible by all of the processors. When a processor is to transfer data to another processor, it copies the data to a predetermined location in the common pool. The processor to which the data is being transferred, when it determines that the data has been copied to the predetermined location, then copies the data to its local memory space. Thus, transferring data from one processor to another requires to copy operations.
The invention provides a new and improved system and method for allocating and using arrays in a computer system, such as a symmetric multi-processor, constructed according to the shared memory architecture.
In brief summary, the invention provides a computer system comprising a. plurality of processes, each having an associated memory region, and a shared memory region shared by the processes. One of the processes is configured to control allocation of space for an array in the shared memory region, generate a descriptor therefor pointing to the allocate space and transmit the descriptor to the other processes. Therafter, all of the processes are configured to identify regions of the array for which they are to process data therein, and to perform predetermined processing operations in connection therewith.