(1) Field of the Invention
The present invention relates to an apparatus and a method for detecting possibility of processing a program in parallel, and an apparatus utilizing detecting results of the above apparatus in translating a program to be applicable to a parallel computer.
(2) Description of the Related Art
Recently, development of the parallel computer system capable of processing a program in parallel has been emphasized. The parallel computer, possessing plural PEs (Processing Element) for executing programs, implements parallel processing by providing each PE with a program.
Generally, a source program described in a higher level language such as FORTRAN is made for serial processing. When such source programs are applied to the parallel computer, a compiler where an object program is generated from the source program needs to be constructed to generate the object program applicable to parallel processing.
A conventional compiler manipulates the source program to generate the object program applicable to the parallel computer. That is, parallelism is extracted from iterations of loop processing such as do loop in the source program. Then, each of the iterations is assigned to the PE to be executed in parallel, detail of which is in Paiud, D. A., el al (1986), Advanced compiler optimization for supercomputers, Communications of the ACM, pp. 1184-1201. In addition, "loop" here refers to a set of instructions which can be executed repeatedly and each repetition is relevant to an iteration.
Construction of the conventional compiler is shown in the block diagram of FIG. 1.
An input unit 1 is equipped to be inputted with the source program from the outside.
A memory 2 is equipped to hold programs such as a program for implementing compilation, the source program inputted by the input unit 1, and the object program generated in the compilation.
A processor 3 is equipped to execute compilation based on the program held in the memory 2.
An output unit 4 is equipped to output the object program, which is generated by the processor 3 and is held in the memory 2, to an output device (e.g., a display device, a printer, a file device).
Function of the conventional compiler, which is the compilation made by the processor 3 based on the program in the memory 2, is shown in the block diagram of FIG. 2.
An input unit 21 is inputted with the source program from the outside.
A loop detector 22 detects loop processing from the source program.
An array detector 23 detects an array from the source program.
A storage unit 24 stores the loop detected by the loop detector 22 and the array detected by the array detector 23.
A judgement unit 25 judges if iterations of the loop can be processed in parallel by referring to what is in the storage unit 24.
An object generation unit 26 generates the object program applicable to parallel processing of the loop when the judgement unit 25 judges that parallel processing is possible while generating the general object program when the judgement unit judges that parallel processing of the loop is impossible.
An output unit 27 outputs the object program to the display device, the printer, the file device and the like.
Operation of the conventional compiler with its construction and function described hereinbefore is described referring to the flow chart of FIG. 3.
First, the source program to be compiled is inputted to the input unit 21 (step 31). An example of such source program is FIG. 4, where two loops ("Do 20" loop for a variable j, "Do 10" loop for another variable j) are nested inside an outer loop 400 ("Do 10" loop for a variable i).
Second, loop processing is detected by the loop detector 22 from the source program, and the detecting result is stored in the storage unit 24 (step 32). In FIG. 4, a do-statement and a continue-statement in a pair are detected from the source program as loop processing (step 32). Third, whether or not loop processing has been detected is examined (step 33). In FIG. 4 loop processing of the loop 400 has been detected.
Fourth, when loop processing has been detected, an array in loop processing is detected by the array detector 23 and the detecting result is stored in the storage unit 24 (step 34). In FIG. 4 arrays of a(1), c(i), d(i), a(j), e(i, j) on the left side of the statement and arrays of a(j-1), c(j), d(j), a(j) on the right side of the statement in the loop 400 are detected. If loop processing had not been detected in FIG. 4, general object program would be generated for serial processing. Fifth, the judgement unit 25 judges possibility to process the loop in parallel by examining whether or not there exists any data-dependence relation across iterations of the loop based on what is in the storage unit 24 (step 35). More precisely, it is examined whether or not the array on the right side refers to the array on the left side in the same iteration of the loop considering the upper limit value and the lower limit value of the control variable. In FIG. 4 the judgement unit 25 judges that parallel processing of the loop is possible. To be concrete, in "Do 20 loop" the value of the array a(j-1) on the right side of the statement varies between 1 and 9 when the upper limit value and the lower limit value of control variable j are defined to be 2 and 10 respectively (j=2-10). Such value range of the array a(j-1) has been defined before the loop 400 by the arrays of a(j) varying between 2 and 10 and a(1) so that no data-dependence relation exists across iterations of the loop 400. In addition, possibility of processing in parallel the loops nested in the loop 400 including the "Do 20" loop and the "DO 10" loop is left unjudged here.
Finally, when no data-dependence relation exists, the object program is generated to process the loop in parallel (step 36). If there were any data-dependence relation across iterations of the loop, the general object program would be generated (step 37). In FIG. 4 the object program is generated to process the loop 400 in parallel.
As is described hereinbefore, conventionally whether or not a data-dependence relation exists is judged by referring to the upper and lower limit value of a control variable in a do-statement and also a subscript statement in an array.
However, conventional judgement is inaccurate as to following loops.
(1) A loop holds an if-statement and the value of a variable is not obtained until the if-statement is executed. PA1 (2) A loop holds a subroutine which may have the data-dependence relation. PA1 (3) A subscript statement of an array in a loop includes computation which is not predetermined.
Thus, conventionally processing one of above three loops in parallel is judged to be impossible even when it is logically possible.