1. Field of the Invention
This invention relates to a method of parallelizing a program, an apparatus for parallelization, and a recording medium storing the parallelization program, and particularly relates to a technology for converting a program prepared for serial processing computers or shared memory-type parallel computers into a program for distributed memory-type parallel computers.
2. Background Art
There is a case wherein it is desired to execute calculations by allocating individual elements of arrayed data to an optional processor of a distributed memory-type computer.
For example, in an application program of a simulation for clarifying a physical phenomenon by calculating interactions of particles freely moving in a space, the calculation is executed, while preserving data corresponding to individual particles as an array, by allocating these particle data in the array to a suitable processor according to its position in the simulation space. Japanese Patent Application, First Publication No. Hei 5-274277 shows an example of an apparatus which realizes such a simulation while reducing the communication cost between processors by allocating the same number of the particles distributed in a simulation space to each processor and by making the positional relationship of those processors equivalent to the positional relationship of the particles in the simulation space.
The conventional technique is constructed, as shown in Japanese Patent Application, First Publication No. 8-227405, on the premise that a user forms a program for the distributed memory-type parallel computer. Thus, when it is desired to carry out the allocation of such irregularly distributed particles by a processing program, it was necessary for a user to describe the complicated procedures on the program to rearrange the particle data of particles to be allocated to the same processor such that the array of the particles is arranged in ranges of successive subscripts and to subsequently divide the array into each range. The process of rearranging the particle data is originally not required, in the case when the simulation program is described for the serial processing computer or the shared memory-type parallel computer, so that such processing is unrelated to the essential subject given to the computer to solve. To require a user of the distributed memory-type parallel computer to describe such complicated program not only degrades the consistency of the program, but also raises the cost for developing the program and at the same time reduces the convenience of the distributed memory-type parallel computer.
In order to perform allocation of such an irregularly distributed array effectively, it is clear and preferable if the user can designate division of the array in the program by using a particular array called a xe2x80x9cmapping array xe2x80x9d which maintains the corresponding relationship between individual array elements and the processor to which these elements will be allocated. This method is proposed, for example, in a document by G. Fox, et al., entitled xe2x80x9cFortran D Language Specification, CRPC-TR90079xe2x80x9d, Department of Computer Science, Rice University, April 1991. The designation of divided arrays to each processor by such a mapping array is called xe2x80x9cindirect divisionxe2x80x9d.
However, a method to convert a program including an array designated to be divided by indirect division into a program to be effectively executed in parallel is not known. That is, an example of technique to convert the program prepared for the serial processing computer into a program for the distributed memory-type parallel computer is disclosed in Japanese patent Application, First Publication No. Hei 6-139212. This technique is based on the premise to perform regular divisions, such as row division, line division, and a combination of row and line division. Therefore, the above document does not disclose a method of irregular division, or in other words, a method of dividing data that is irregular. That is, the above document does not disclose a method of converting a program containing an array designated to undergo indirect division by the mapping array into a program to be executed in parallel.
It is therefore the object of the present invention to provide a method of converting a program containing an array designated to undergo indirect division into a program effectively executable in parallel processing, so that the user can substantially designate indirect division of the array and to improve the convenience of the distributed memory-type parallel computer.
A method for parallelizing a program according to the present invention comprising the steps of: changing a declaration of an array to be subjected to indirect division; inserting a declaration of the subscript conversion array; inserting a statement to calculate the size of the array after indirect division; inserting a statement to preserve or to release an area for the array to be divided by indirect division and the subscript conversion array; inserting a content of the subscript conversion array; changing a control range of a loop; and changing the control variable reference.
In more detail, the present method comprises;
the first step for inputting a program to be parallelized;
the second step for changing a declaration of an array, which division is designated by the mapping array in the input program and which is to be divided by indirect division, into a declaration of the allocation array;
the third step for inserting a statement of the allocation array for each mapping array in the input program;
the fourth step for inserting a statement for calculating the size of an array after being divided by indirect division for each mapping array in the input program;
the fifth step for inserting a statement to preserve during processing an area for the array to be divided by indirect division according to the result of calculation during processing obtained in the fourth step, and said subscript conversion division, and a statement to release said area;
the sixth step for inserting a statement to dynamically calculate the content of said subscript during processing based on the content of the mapping array;
the seventh step for changing the control range of all parallelizable loops containing a reference of the array to be divided by indirect division;
the eighth step for changing references of the control variables of arrays beside the indirectly divided array among loop arrays to the reference of the subscript conversion array, conforming with the change of control variables in the seventh step; and
the ninth step for outputting a program obtained by processing from the second step to the eighth step as the parallelized program.
A program parallelization apparatus of the present invention comprises, a means to change the declaration of the array to be divided by indirect division; a means to insert the declaration of the subscript conversion array; a means to insert a statement of a size of the array after division of the array to be divided by indirect division; a means to insert a statement to preserve and release the area for the array to be divided by indirect division and the subscript conversion array; a means to insert a statement to calculate the content of the subscript conversion array; a means to control the range of the loop; and a means to change the control variable reference in the loop.
In more detail, the program parallelization apparatus comprises:
the first means for inputting the program to be parallelized;
the second means for changing a declaration of an array, which is to be subjected to indirect division, and which is designated to be divided by the use of the mapping array in the input program;
the third means for inserting the declaration of the allocated subscript conversion array for each mapping array in the input program;
the fourth means for inserting a declaration to calculate during processing the size of the array after indirect division for each mapping array in the input program;
the fifth means for inserting statements to preserve and to release the area for the array to be subjected to indirect division and said subscript conversion array, in response to the size of the array obtained during processing by the statement inserted by the fourth step;
the sixth means for inserting a statement to dynamically calculate the content of said subscript conversion array based on the content of the mapping array;
the seventh means for changing the control ranges of every loop which is parallelizable and which includes references of arrays to be subjected to indirect division in the input program;
the eighth means for changing the reference of the control variables of arrays, except those subjected to indirect division, to the reference of said subscript conversion array, together with the change of the control range of the loop by said seventh means; and
the ninth means for outputting the parallelized program obtained by processing according to said second to eighth steps.
The action of the apparatus for parallelizing a program can be summarized as follows. The present apparatus becomes capable of distributing the array to undergo indirect division grouped to local memories belonging to each processor of a distributed memory-type parallel computer by means of the second means for changing the declaration of the array to be divided by indirect division, a fourth means for inserting a statement to calculate the size of the array after indirect division, and the fifth means for inserting a statement to preserve and to release the area to store the array for indirect division and the subscript conversion array. The present apparatus becomes subsequently capable of matching the subscripts for the array after indirect division and for the other arrays by means of the third means for inserting the declaration of the subscript conversion array, the fifth means for inserting the statement to preserve and release the area for the subscript conversion array, and the sixth means for inserting an statement to calculate the content of the subscript conversion array. In addition, it becomes possible to convert the input program into a program for executing the loops in parallel by means of the seventh means for changing the control range of the loop and the eighth means to change the control variables in the loops.