The present invention relates to an evaluating method of the data division which is necessary for allowing a sequential program for technical calculations or the like to be efficiently executed by a parallel computer of a distributed memory. The invention also relates to a program converting method which uses such an evaluating method and which is used to convert the sequential program into the parallel programs to be executed by a parallel computer.
In the numerical simulation, in accordance with a variety of objects to be analyzed and precision of the analyzed contents, there is increasing a demand to further promote studies and developments by obtaining the results of large scale calculations in a short time. As a powerful machine which can satisfy such a demand, an expect to a massively parallel computer of a distributed memory is more and more increasing.
As shown in FIG. 20, a distributed memory parallel computer is a parallel computer in which a number of processors 200 each of which is connected to each of local memories 100 are coupled by a network 300. Each local memory is used as a main memory of the processor connected to such a local memory. Each local memory holds data allocated to the processor connected to such a local memory and a program to execute processes allocated to such a processor. Such a distributed memory parallel computer has an advantage such that each processor can execute the program asynchronously with the other processors.
In case of producing parallel programs to be executed by such a parallel computer, however, the time to execute the programs by the parallel computer largely depends on a method of forming such programs.
In the conventional technique, however, in many cases, it can be said that the works to produce the parallel programs for allowing the processes which are required in the sequential program to be efficiently executed by such a massively parallel computer are performed by manual works of users.
FIG. 2 is a diagram showing an example of a sequential source program. It shows a forward substitution part of a program by a Gaussian eliminating method. Numeral 203 denotes a process to divide the elements of a column just under the diagonal elements of a two-dimensional array a[i][j] by the diagonal elements. Numeral 205 denotes a process to update the matrix elements of an uneliminated portion from the elements of certain row and column of the present diagonal elements. The whole program is constructed by triple loops of loops 201, 202, and 204. Among the triple loops, the loops 202 and 204 are loops which can be executed by parallel processes. That is, the processes at different numbers of loop repeating times of those two loops can be executed by different processors.
In case of converting such a sequential program into parallel programs to be executed by each processor of the parallel computer, at least the following three items must be determined.
(1) Division of data (namely, array) PA0 (2) Allocation of the data groups obtained by the division to each processor. PA0 (3) Allocation of the processes to each processor PA0 (a) Data which is processed by a sequential source program to be converted into parallel programs is divided into a plurality of data groups in accordance with a plurality of data dividing pattern candidates to divide such data. PA0 (b) Each of the data groups is allocated to one of a plurality of processors included in a distributed memory parallel computer in accordance with predetermined rules. PA0 (c) A plurality of different partial processes in the processes of the sequential source program are allocated to each processor in accordance with the predetermined rules. PA0 (d) In a state in which those plurality of data groups have been allocated to each processor by the step (b), an amount related to an executing time that is necessary to execute in parallel the plurality of partial processes allocated to the processors by the step (c) is estimated as evaluation information of one of the data dividing pattern candidates. PA0 (e) The processes in steps (a) to (d) are repeated with respect to each of the other plurality of data dividing patterns. Further, according to a more desirable aspect of the invention, an evaluation program to evaluate a sequential program while executing the sequential program by a sequential computer is produced. PA0 (a) The data which is included in a sequential program and should be processed by a plurality of processors is divided into a plurality of data groups in accordance with a predetermined data dividing pattern. PA0 (b) Those plurality of data groups are allocated to the processors in accordance with the corresponding relation between the data groups and the processors which has been predetermined in correspondence to the data dividing pattern. PA0 (c) The process which is specified by the sequential program is divided into a plurality of partial processes so that a statement to define to the data groups which are allocated to the self processor is executed by the self processor. Each partial process is allocated to each processor. PA0 (d) A transmission command and a reception command of data which are necessary when the data and the process have been allocated to the processors by the steps (b) and (c) are inserted into the sequential program.
Generally, since the number of elements of an array to be processed is larger than the number of processors, those elements are divided into groups each having a plurality of data. As will be explained hereinlater, it is known that various kinds of data dividing patterns.
Decision is made with respect to that the processes at which one of the numbers of repetition times of the loops which can be executed by parallel processes in the sequential program are executed by each processor.
In the distributed memory parallel computer mentioned above, the program allocated to each processor is formed in a manner such that in the data allocated to the processor, the data which is necessary for the other processors is transmitted to the other processors at a proper timing and that in the data allocated to the other processors, the data which is necessary by the processor and which has been transmitted from the other processors is used at a proper timing.
The time which is required to transfer certain data between the processors is extremely longer than the time which is required to execute arithmetic operations for such data in either one of the processors. Therefore, the executing time of the parallel programs also largely depends on an amount of data that is transferred between the processors and the time which is required to transfer each data. The data amount depends on the processes themselves written in the sequential program and the above three processing methods.
In spite of the fact that the executing time of the parallel programs depends on the above three processes as mentioned above, the conventional technique cannot automatically do the above three processes so as to reduce the executing time of the parallel programs.
There is, consequently, troublesomeness such that the user must decide the above three processing methods.
Furthermore, since the executing time of the parallel programs depends on the data communication amount between the processors and the like as mentioned above, there is a problem such that the user cannot always make an ideal decision from a viewpoint of the reduction of the executing time of the parallel programs.
One of the methods for solving the above problems has been described in the papers of "Proceedings of the Fifth Distributed Memory Computing Conference", pages 1160 to 1170, 1990. According to the above method, after the user manually instructed a data dividing pattern that is considered to be optimum with respect to the array which is processed by a sequential program, the allocation of data (array elements) to each processor is determined in accordance with a format which is determined by the designated data dividing pattern and, further, the process to be allocated to each processor among the processes written in the sequential program is automatically decided. That is, the processses (2) and (3) among the above three processes (1) to (3) are automatically executed.
According to the conventional technique, when the processes are allocated to each processor, the processes which are allocated to each processor are determined so that the processor executes the definition sentences to the array elements allocated to the processor. In the above case, when the data which is used in the process allocated to either one of the processors is the data allocated to the other processor, the program sentences to transfer the data to the processor which uses the data from the other processor and the program sentences to confirm the reception of the transferred data before such data is used are also automatically added to the original sequential program.