Field of Applicable Technology
The present invention relates to a method of implementing a concurrentization type of compiler, for converting a source program written in non-concurrent form to an object program that can be executed by a parallel-processing multiprocessor computer.
FIG. 1 is a simple block diagram illustrating an example of a multiprocessor computer which is made up of a set of eight processor modules designated by reference numerals 1 to 8 respectively. Each of the processor modules includes a processor unit and a common memory, i.e. the first processor module 1 includes a processor unit 9 and a common memory 10, the second processor module includes a processor unit 11 and a common memory 2, . . . and the processor module 8 includes a processor unit 23 and a common memory 24. The processor modules 1 to 8 further include respective local address/data buses 26 to 33 as shown, each of which communicates with a common address/data bus 25, while the common address/data bus 25 also communicates directly with each of the common memories 10 to 24. Thus, data or addresses can be transferred within each processor module between the processor unit and the common memory of that processor module, over the local address/data bus of the processor module, and also between each processor unit and any of the processor units of other processor modules, via the common address/data bus 25.
FIG. 2 shows an example of a portion of a computer program written in FORTRAN language for execution by a serial-operation computer, for repetitively obtaining the sum of a vector C and the product of a matrix A and a vector B, and storing the result as the vector C. The program portion is a nested loop, i.e. an outer DO loop designated by numeral 48 and an inner DO loop 49. The outer loop 48 has a total of 8 iterations (determined by the index I). With a serial-processing type of computer it is necessary to execute these iterations successively, however it is possible, using the multiprocessor computer of FIG. 1 for example, to execute these iterations in parallel, i.e. by assigning the iterations of the outer loop 48 to be executed concurrently by respective ones of the processor modules 1 to 8 in FIG. 1. That is to say, each processor module would be assigned a specific one of the values 1 to 8 of the index I, and would execute a corresponding 8 iterations of the inner loop 49 (i.e. for successive values of the index J) using that value of the index I.
Such concurrent execution is illustrated in the sections (1) to (8) of FIG. 3, which are respectively executed by the processor modules 1 to 8 in FIG. 1. As shown, the processor module 1 executes the inner loop inner loop 49 of the program of FIG. 2 using the value 1 for the index I. In parallel with that, the processor modules 2 to 8 execute that inner loop 49, using the respective values 2 to 8 for the index I, so that the iterations of the outer loop 48 are being executed in parallel.
It should be noted that the term "parallel" as used herein, e.g. in referring to "parallel execution" or "parallel code" relates only to concurrent processing by the processors of a multiprocessor computer, and does not relate to vector computation.
In order to achieve such parallel operation, it is necessary to generate program code for each of the processor modules. It may be possible to do this using the same types of program language that have been used for conventional serial computers. However it has become possible to use special high-level program language, as illustrated for example in FIG. 4, whereby the user can directly specify that parallel execution is to be applied to particular portions of a program. That is to say the "PARALLEL DO" instruction in this example designates that the iterations of the outer loop 50 (corresponding to the outer loop 48 of the program of FIG. 2) are to be assigned to respective processors of a multiprocessor computer, to be executed concurrently. Thus when the programmer decides that a portion of a program is suitable for parallel execution, such specialized statements can be inserted into the program as required, i.e. the programmer can manually concurrentize the program.
However with such a method, it is necessary for the programmer to always take into consideration the possibility that parallel execution may be possible, when writing the program, and must also take into consideration the architecture of the particular type of multiprocessor computer for which the program is being written. These factors make the process of writing a program for use with a multiprocessor computer more complex than is the case for a non-concurrent type of program, resulting in a low efficiency of software development, even when such a specialized high-level language for a parallel architecture computer is used.
In addition, very large amounts of software have been already developed for use with serial architecture computers, i.e. written in a high-level language such as conventional FORTRAN. If such software is to be concurrentized for use with a multiprocessor computer, then due to the very large volume of software it will not be practical to carry out such adaptation by means of programmers modifying each serial-architecture program. The meaning of the term "compiling" is sometimes limited to signifying the conversion of a program written in a high-level language such as FORTRAN to machine language. However as used herein, the term "concurrentization compiling" signifies the conversion of a source program that is written in a form for execution by a serial-processing computer to an object program which can be executed by a multiprocessor computer, with concurrent execution of appropriate portions of the program by the multiprocessor computer. The term "object program which can be executed by a multiprocessor computer" as used herein can signify a program in machine code for a multiprocessor computer, or a program written in a high-level language which includes special concurrent-processing control statements (such as the FORTRAN "parallel-DO" statement), and which can be directly converted to machine code for processing by a multiprocessor computer by using a suitable compiler.
SUMMARY OF THE INVENTION
It is an objective of the present invention to overcome the disadvantages described hereinabove, by providing a a concurrentization compiling method whereby an object program for execution by a multiprocessor computer can be automatically generated from a source program that has been written for execution by a serial architecture computer. Thus, the invention provides a method of implementing a compiler whereby it is not necessary for a programmer who prepares the source program to be aware that the object program which will be generated is to be executed by a multiprocessor computer.
To achieve the above objective, the present invention provides a compiler method for generating from a source program an object program to be executed by a multiprocessor computer, including successive steps of analyzing program elements of the source program, a step of analyzing the syntax of the source program based on results obtained from the step of analyzing program elements to thereby generate intermediate code, a parallelism detection step of analyzing the intermediate code to detect program loops and to judge whether concurrent execution of a detected program loop is possible, and a coding step for generating code of the object program in accordance with results obtained in the parallelism detection step, in which the parallelism detection step comprises successive steps of:
(a) judging whether a detected program loop is a nested loop, and, if a nested loop is found;
(b) analyzing data dependence relationships within the nested loop to judge if parallel execution of iterations of an outermost loop of the nested loop is possible;
(c) if parallel processing is found to be impossible, analyzing data dependence relationships within an inner loop of the nested loop to determine if the concurrent execution would be made possible by interchange of the inner loop and outermost loop; and
(d) if it is judged that parallel processing would be made possible by the loop interchanging, analyzing data dependence relationships of the inner loop to judge whether it is possible to perform the loop interchanging, and if so, performing the loop interchanging.
The invention moreover provides a compiling method in which the step (b) of judging whether parallel execution is possible comprises successive steps of:
(e) judging whether parallel execution is made impossible due to the presence of any program statement between an initial loop control statement of an outermost loop of the nested loop and a succeeding loop control statement of an inner loop that is closest to the outermost loop;
(f) if parallel execution is judged to be possible in step (e), judging whether a condition of independence of data relationships between the inner loop and the outermost loop exists, which makes possible the parallel execution.
Moreover if it is found as a result of any of the above steps (b), (c) and (d) that parallel execution is not possible, a step is executed of judging whether it is possible to divide an outermost loop of the nested loop into a plurality of adjacent loops, and if such dividing is judged to be possible, a step of dividing into the adjacent loops is executed, followed by executing each of the above steps (b), (c) and (d) for each of the resultant adjacent loops That is to say, each loop is examined to judge whether concurrent execution is possible directly, or by modifying the structure of the loop.
Thus with the compiling method of the present invention, data dependence relationships within a program that is written in a serial-architecture programming language are analyzed, and for each loop which occurs in the program, processing is executed which may consist of one or more of the following:
(a) Loop interchanging, i.e. interchanging the outermost loop with an inner loop (in the case of a nested loop), and
(b) Loop fission, i.e. splitting the loop into a plurality of adjacent loops, referred to in the following as blocks.
Before such loop interchanging or loop fission is performed, a check is made as to whether that can be done without affecting the data dependence relationships within the loop. This enables concurrentization of portions of the program to be performed, where appropriate, without destruction of data dependence relationships. The invention thus enable such processing of a serial-architecture program to be performed automatically by the compiler, so that it is not necessary for a programmer to be aware that a program is to be written for concurrent multiprocessor computer use. Moreover, existing software that has been written for a serial architecture computer does not need to be modified by a programmer (e.g. using a special multiprocessor language as mentioned hereinabove) in order to obtain a source program to be executed by a multiprocessor computer, i.e. concurrentization of such existing software can be executed automatically simply by submitting the software to a compiler that operates in accordance with the method of the present invention.