(1) Field of the Invention
The present invention relates to a parallel computer that processes data in parallel by transferring partitioned data between a plurality of processor elements.
(2) Description of the Related Art
In recent years, parallel computers, which utilize a plurality of processor elements (PE's) simultaneously, are used for numerical calculation that involves a massive amount of data to speed up an overall operation.
Of various models being developed, distributed-memory type parallel computers have been put into practical use. This model is suitable particularly for an array calculation. This is because the massive amount of data used in the numerical calculation are generally described by arrays, and this model stores both programs and distributed data (partitioned array) in the PE's to do a calculation in parallel.
In the array calculation, array elements are used repetitively: an array is partitioned into a plurality of sets of elements in the direction of a subscript depending on programs, and the resulting array data (sets of array elements) are distributed to the PE's to be stored into their respective memories each time the direction changes. Typical arrays are of two- or three-dimension, and the allocation of the array data to the PE's depends on the direction of partition. Hence, it frequently happens that a program and the array data necessary for that program are not stored in the same PE.
In such a case, or when each PE demands array data partitioned in different directions, the array elements are transferred between the PE's. More precisely, each PE sends unnecessary array data to the other PE's while receiving necessary array data from the other PE's via a network interconnecting the PE's. This data transfer enables the PE's to calculate the array partitioned in different directions simultaneously.
Parallel with the advancement of the computers, programing languages have been developed as well. For example, HPF (High Performance Fortran Language) and ADETRAN(ADENART FORTRAN) have been proposed for the above distributed-memory type parallel computers. [For further information, see "High Performance Fortran Language Specification Version 1.0", High Performance Fortran Forum, May 1993 and "Parallel Programming Language ADETRAN", Tatsuo Nogi, Memoirs of Faculty of Engineering, Kyoto University Vol. 51, No. 4, 1989]
FIG. 1 is a block diagram depicting a structure of a conventional distributed-memory type parallel computer. This computer includes a plurality of PE's interconnected via a network 3. Although the number of the PE's varies from several to hundreds or more, assume that the computer includes four PE's (PEa-PEd) herein, and only PEa, PEb are shown in FIG. 1 for simplification. Note that all the PE's are of the same structure.
The PE comprises a local memory 1, a data transfer unit 2, a processor 4, an address bus 5, and a data bus 6.
More precisely, the local memory 1 stores a program for the processor 4 and the array data. The data transfer unit 2 sends array data from the local memory 1 to the other PE's via the network 3 while storing the array data sent from the other PE's into the local memory 1. The processor 4 runs the program, or calculates, using the array data stored in the local memory 1. The address bus 5 sends out an address signal while the data bus 6 sends out a data signal each time the array data are read out from or written into the local memory 1.
FIG. 2 is a source program composed of twenty-four statements in ADETRAN: seven declaration statements, seven pass statements, and ten executable statements. The declaration statements are: Parameter Statement (1) specifying an array size, or the number of array elements in one direction; Declaration Statements (2), (3) showing that two-dimensional arrays a, b are subject to calculation; and Declaration Statements (4)-(7) instructing to acquire storage areas in each local memory 1 for the array partitioned in the direction of a subscript between slashes.
FIG. 3 is a memory map of the storage areas for the arrays a, b acquired according to Statements (4)-(7) in each local memory 1. To be more specific, Storage Areas 33a-33d are acquired by Declaration Statement (6) to store the array a partitioned in the direction of a second subscript (j-direction): Storage Areas 33a, 33b, 33c, and 33d are acquired for the sets of the array elements, a(/1/,1)-a(/1/,4), a(/2/,1)-a(/2/,4), a(/3/,1)-a(/3/,4), and a(/4/,1)-a(/4/,4), respectively. Similarly, Storage Areas 34a-34d are acquired by Declaration Statement (7) to store the array b partitioned in the same direction. Also, Storage Areas 31a-31d and 32a-32d are acquired by Declaration Statements (4), (5) to store the arrays a, b partitioned in the direction of a first subscript (i-direction), respectively.
Pass Statements (8)-(10) instruct to partition the array a(n, n) in the j-direction to distribute the resulting array data (sets of array elements) to respective PE's.
Executable Statements (11)-(15) instruct each processor 4 to process the array data distributed in accordance with Pass Statements (8)-(10) simultaneously.
Pass Statements (16)-(19) instruct to re-partition the array a(n, n) in the i-direction to distribute the resulting array data to respective PE's.
Executable Statements (20)-(24) instruct each processor 4 to process the array data distributed in accordance with Pass Statements (16)-(19) simultaneously.
For further understanding, the operation of the conventional distributed-memory parallel computer when run by the above source program will be given.
First, the source program is translated into a machine language by a compiler for parallel computers, and Statements (4)-(24) are loaded into each local memory 1 via the network 3 by an unillustrated control unit.
Accordingly, each PE acquires the storage areas as shown in FIG. 3 in accordance with translated Declaration Statements (4)-(7). Note that although the storage areas are acquired, the array data are not stored yet at this point. The array data, or the partitioned array in the j-direction, are allocated to their own local memories 1 and stored into their respective storage areas in accordance with translated Pass Statements (8)-(10). In case of PEa, for example, the data transfer unit 2 receives the set of the array elements a(1,/1/)-a(4,/1/) via the network 3, and writes the same into the local memory 1 via the data bus 6. PEb-PEd operate in the same way, and as a result, the array a are partitioned in the j-direction and subsequently distributed to Storage Areas 31a-31d as shown in FIG. 3.
Next, each PE computes in parallel to find b(i,/j/) using the distributed array elements in accordance with translated Executable Statements (11)-(15), and b(i,/j/) thus found is written into their local memories 1. In case of PEa, for example, the processor 4 processes the array elements in accordance with translated Executable Statement (13). PEb-PEd operate in the same way, and as a result, the array b are partitioned in the j-direction and subsequently distributed to Storage Areas 32a-32d as shown in FIG. 3.
Accordingly, each PE transfers the array elements via the network 3 to re-partition the arrays a, b in the i-direction to distribute the resulting data elements to respective PE's in accordance with translated Pass Statements (16)-(19). In case of PEa, for example, the data transfer unit 2 sends the array elements currently stored in Storage Area 31a, 32a to the other PE's via the network 3, which is illustrated by FIG. 4: the array elements a(2,/1/), a(3,/1/), and a(4,/1/) in Storage Area 31a are sent to Storage Area 33b in PEb, 33c in PEc, and 33d in PEd, respectively, and the array elements b(2,/1/), b(3,/1/), and b(4,/1/) in Storage Area 32a are sent to the other PE's in the same way. On the other hand, the data transfer unit 2 writes the array elements sent from the other PE's into Storage Areas 33a, 34a, which is illustrated by FIG. 5: the array elements a(1,/2/) in PEb, a(1,/3/) in PEc, and a(1,/4/) in PEd are sent to Storage Area 33a, while the array elements b(1,/2/) in PEb, b(1,/3/) in PEc, and b(1,/4/) in PEd being sent to Storage Area 34a.
Subsequent to this array-element transfer, each PE processes the array elements thus stored in parallel to find b(/i/,j) in accordance with Executable Statements (20)-(24), and b(/i/,j) thus found is written into their respective local memories 1. In case of PEa, for example, the processor 4 processes the array elements in accordance with Executable Statement (22); PEb-PEd operate in the same way.
As has been explained, the inbound and outbound array elements are transferred simultaneously, and each PE must have a storage capacity at least twice as large as one array for each array. Because if it has a storage capacity only as large as one array, the array elements in the storage area may be overwritten by those received from the other PE's before they are sent to the other PE's. This undesirably increases the memory capacity, particularly when the arrays are of higher-dimension where there are as many partition directions as the number of subscripts.