The demand for ever faster computation has caused a switch from single central processing unit (CPU) computers to multiple CPU computers. The theory behind this switch is that what one CPU can accomplish multiple CPUs can accomplish more quickly by operating in parallel.
The realization of the full potential of multiple CPU computers has been impeded by memory architecture. One type of prior multiple CPU computer is based upon a distributed memory architecture. In a distributed memory computer each CPU is connected directly to one section of memory, called local memory. Processing time in distributed memory computers can be greatly decreased provided that data is apportioned among the local memories such that the majority of data accesses are local. Non-local data accesses typically result in higher network traffic and reduce the overall system performance of a distributed memory computer.
To reduce non-local data accesses, data arrays in programs are often decomposed into contiguous subarrays or "blocks" to preserve "locality of reference" in programs that calculate each array element's value from the values of it's immediate neighbors. This kind of program is fairly common and helps to preserve the abstraction of a global array when it is distributed over a distributed memory system.
The use of a single, global, addressing scheme also helps preserve the abstraction of a global array when it is distributed over a distributed memory computer. Unfortunately, the typical addressing scheme in a distributed memory computer does not accommodate a single, global, addressing scheme without significant overhead or increasing compiler complexity. To illustrate, consider FIG. 1. A global array, constituting the data to be distributed, is shown in FIG. 1A with global coordinates for each global element indicated below it in parentheses. The global array can be decomposed into subarrays, as shown in FIG. 1B. Indicated below each element is its global coordinates, in parentheses, and its subarray coordinates, in brackets.
Each subarray is stored in a local memory in sequential order according to the subarray coordinates. For example, with a row major language such as C the array elements would be stored in the following order: [0,0], [0,1 ], [1,0], [1,1 ]. In a column major language like FORTRAN the array elements would be stored in the following order: [0,0], [1,0], [0,1], [1,1]. The CPU associated with the local memory initially stores as its base pointer the memory location of the subarray base element; e.g. [0,0]. The local address of any array element within any subarray may be calculated from its local indices, using an offset from the unshifted base pointer. The formula for a row major language is: EQU O=X*Y.sub.dim +Y
where O is the offset of the element from the unshifted base pointer, X and Y are the element's local coordinates, and Ydirn is the size of the local subarray in the Y dimension.
A simple addressing scheme for global data accesses is to subtract the subarray's global minima (the global coordinates of the subarray's base element) from the desired array element's global coordinates to obtain local coordinates. A subarray offset is then calculated using local coordinates and subarray dimensions. For a row major language, the subarray offset may be calculated by Equation (1). EQU O.sub.sub =(Xb-X.sub.min)*Y.sub.dim +(Yb-Y.sub.min) (1)
where:
O.sub.sub is the subarray offset; PA1 X.sub.b, Y.sub.b are the global coordinates of the desired array element; PA1 X.sub.min, Y.sub.min are the global coordinates of the subarray element with the lowest global coordinates; i.e. the subarray global minima; PA1 Y.sub.dim is the subarray dimension in the Y direction.
Note that Equation 1 will retrieve the appropriate array element only if the node containing the army element is addressed. In other words, this addressing scheme requires knowledge of the location of array elements within the distributed memory. How that information is obtained does not effect the addressing scheme.
For example, consider the accessing of global element (3,3) within subarray 3. First, the global minima for subarray 3 are determined. The global coordinates of the global minima are (2,2). Second, the global minima are subtracted from the global coordinates (3,3) to obtain local coordinates of: [1,1]. Finally, the local offset for global element (3,3) within subarray 3 is calculated according to Equation 1 above, where Y.sub.dim =2.
A disadvantage of this scheme of global addressing is that the subtraction of subarray global minima from global coordinates must occur every time any array element is addressed. The subtraction step alone can result in a significant cost in performance.
Other methods that offer the ability to address to array elements by their global coordinates often support global addressing only in very restrictive addressing patterns. Often these methods result in parallel programs that can only run on a specific number of processors and must be recompiled if the user wishes to run the program with more or fewer processors.