1. Field of the Invention
The present invention pertains generally to the field of high speed digital processing systems, and more particularly to a method of modifying the extent of an array dimension in order to simplify addressing across processor elements.
2. Background Information
Massively parallel processing systems have received a great deal of attention because of their potential for orders of magnitude increases in processing power over conventional systems while maintaining competitive costs. One such massively parallel (MPP) system is illustrated in the block diagram of FIG. 1A. As can be seen in FIG. 1A, massively parallel processing system 100 includes hundreds or thousands of processing elements 102 (PE's) linked together by an interconnect network 104. System 100 of FIG. 1A is a distributed memory system in that system memory is distributed as individual local memories 106 connected to each processor element 102. Typically, each processor 102 has a favored low latency, high bandwidth path to a group of local memory banks within an associated local memory 106, and a longer latency, lower bandwidth access to the local memory 106 associated with other processor elements 102 over interconnect network 104. The longer latency memory referenced across the interconnect network is typically referred to as remote or global memory. References to such remote memory 106 traverse interconnect network 104 to some uniquely identifiable processor element 102 attached to network 104.
Memory in system 100 is distributed as local memories 106 connected to each of the processor elements 102. It can be advantageous in such an architecture to address all memory within system 100 as if it occupied a single address space but with a non-uniform access time. In such a globally addressed system, memory references are first examined to see if they are addressing the local memory 106 associated with the issuing processing element 102. If not, the request is routed out onto network 104 to the appropriate processor element 102. One embodiment of a method of routing data across a toroidal mesh interconnect is described in U.S. patent Ser. No. 07/983,979, entitled "DIRECTION ORDER ROUTING IN MULTIPROCESSING SYSTEMS," filed Nov. 30, 1992, by Thorsen, which disclosure is hereby incorporated by reference.
The global address model permits data objects distributed across all of the PE's to be viewed as if there were a single address space. In one approach described by MacDonald et al. in Addressing in Cray Research's MPP Fortran, Proceedings of the Third Workshop on Compilers for Parallel Computers, July 1992, data distribution is defined through a set of directives that indicate how a data object is to be distributed.
No matter what approach is taken for data distribution, each memory reference to an element within that data object must be analyzed to extract the processor element 102 where the element is located and the offset into the local memory 106 of that PE 102 needed to access the element. The calculation of the PE number and the offset is nontrivial; the complexity of the calculation grows with the number of dimensions that are distributed across processor elements. Methods for extracting the PE and offset from an address in a globally addressed distributed memory system are well known in the art. For instance, two methods are described in the MacDonald et al. reference cited above, which is hereby incorporated by reference. Typically, such methods rely on a number of time consuming integer division and integer modulus operations. MacDonald et al., however, shows that these calculations can be streamlined by requiring that all dimensions have an extent (or size) that is a power of two. Such an approach simplifies PE and offset extraction by converting integer division to right shifts, modulo operations to masking operations and multiplications to left shifts, all of which are faster operations. In addition, such an approach is inherently simpler to implement in hardware.
To capitalize on this simpler approach, commercial globally addressed distributed memory massively parallel processing systems by Thinking Machines and by MassPar require that arrays be defined such that the extent of each dimension in the array is a power of two. Programmers programming in such extent-constrained systems must keep these dimension extent restrictions in mind. Typically, the only help the programmer receives, however, in following these constraints is in the form of a compiler error message generated when the compiler reviews the program code and finds that the extent of an array dimension is other than a power of two. The programmer must then revise the program code to bring the extent of each of the array dimensions to a power of two. Such a limitation is especially onerous when one may wish to define an array dimension as a function of some run time variable.
There is a need for a method of defining array bounds which permits the simpler power of two addressing while at the same time granting programmers greater flexibility in specifying the extent of an array dimension within their program code.