1. Field of the Invention
The present invention relates to a technique applicable to a data processor of a multiprocessor construction and the like and, more particularly, to a data processor of a multiprocessor construction for operating a plurality of data elements processing mechanisms in parallel to distribute loads for faster processing time. In particular, the invention relates to a technique suitable for a data processor for partitioning data elements on a multi-dimensional array into a plurality of sub-arrays to assign one or more sub-arrays to each node for parallel data processing.
2. Description of the Background Art
Jacobi relaxation is a classical relaxation process used for calculation of heat conduction problem and the like. For a two-dimensional plane as an example, this process repeatedly determines a data value of each grid point on the two-dimensional plane in a relaxation calculation cycle by calculating the average of data values of its four adjacent grid points in the immediately previous relaxation calculation cycle to converge the data value of each grid point in an area of interest on a value determined by exteriorly established boundary conditions.
FIG. 7 illustrates an example program when arithmetic processing using the Jacobi relaxation is implemented in a shared-memory multiprocessor. The example program of FIG. 7 uses two two-dimensional arrays (231, 232) to hold data elements of one of the partitioned sub-arrays. In odd-numbered relaxation calculation cycles, the value of each data element in the sub-array 231 is calculated by using the values of four data elements in the sub-array 232. In even-numbered relaxation calculation cycles, the value of each data element in the sub-array 232 is calculated by using the values of four data elements in the sub-array 231. In this manner, the Jacobi relaxation changes the values of the respective data elements included in sub-arrays for each relaxation calculation cycle.
In a conventional data processor, as shown in FIG. 8, data elements at two-dimensional grid points A2 (i, j) have been sequentially numbered in a direction parallel to the rows of the two-dimensional array for association with data elements at one-dimensional grid points A1 (k) {k=8.times.(i-1)+(j-1)}. Then the two-dimensional data array has been transformed into a one-dimensional data array, and the respective data elements have been located in a memory at addresses corresponding to one-dimensional element numbers in the transformed one-dimensional data array.
With the above stated Jacobi relaxation processed by a shared-memory multiprocessor, the whole data array are partitioned into a plurality of tile-shaped sub-arrays as shown in FIG. 9 to distribute loads to processor nodes. In the conventional data processor, since the data elements of the respective sub-arrays are arranged as shown in FIG. 9, the data elements on the edges parallel to the rows of a sub-array have sequential element numbers, but the data elements on the edges parallel to the columns thereof have non-sequential element numbers.
FIG. 10 illustrates the element numbers of the respective data elements located adjacent the sub-array boundary in a data structure used for the conventional data processor. It will be understood from the example of FIG. 10 that each pair of data elements opposed across the boundary between two adjacent sub-arrays have different element numbers in the conventional data processor, and it is hence necessary to separately calculate the element numbers of the opposed data elements.
One data element and four adjacent data elements are to be calculated in the minimum program processing unit in the Jacobi relaxation shown in FIG. 7. Thus the calculation of the data elements in one sub-array provided by partitioning the whole data array requires only the data elements located on the edges of four sub-arrays adjacent the one sub-array. For calculation, nodes of the multiprocessor access only the data elements on the edges of the adjacent sub-arrays to be calculated which are located in a memory in other nodes.
A cache memory has been used in each node of the shared-memory multiproccssor to efficiently access main memories in the node itself and other nodes. The cache memory includes cache lines each consisting of a plurality of data elements stored at sequential memory addresses and accesses the main memory for each cache line to perform data processing. For example, when four data elements forming one cache line, access to the data elements on the vertical edges of two sub-arrays 102, 103 adjacent to the sub-array 101 to be processed by one node results in access to each group of four data elements, such as those designated as 112, 113, forming one cache line in the conventional data processor as shown in FIG. 10. Thus the conventional data processor has a large number of unnecessarily accessed data elements. This results in a significantly increased mount of communications and a significantly increased number of cache lines used for access to the data elements in adjacent sub-arrays.
In particular, the Jacobi relaxation changes the values of all data elements in all sub-arrays for each relaxation calculation cycle. Since data loaded into the cache memory in one node of the multiprocessor from another node are used only for one relaxation calculation cycle, the next relaxation calculation cycle requires new data elements to be loaded into the one node from another node. Such a large number of cache lines to be accessed greatly increase the processing time.