1. Technical Field
The present invention generally relates to compiler optimization and, specifically, relates to an inline expansion method for compiler programming languages such as APL and FORTRAN 90 having array operation functions.
2. Description of the Background Art
In the case of general programming languages, an operation on data (having a single value) of an element is described irrespective of whether it is directed to a scalar variable, array, or structure. APL, FORTRAN 90, etc., describe both an operation on an element of an array and operation on the array itself. Scalar and array expressions of the same array operation by FORTRAN 90 are shown below.
Scalar expression of an array operation:
______________________________________ (Formula 1) DO I=10,90 A(I) = B(I+1) + 1 END ______________________________________
Array expression of the array operation: EQU A(10:90)=B(11:91)+1 (Formula 2)
The use of an array expression has the following two advantages:
An operation can be described simply and logically. PA1 Parallelism associated with an operation or process can be described naturally. PA1 (1) Inline expansion of intrinsic functions; PA1 (2) Reduced number of array temporaries; and PA1 (3) Improved loop efficiency PA1 A virtual number of dimensions and index ranges where the number of dimensions and index ranges of a temporary array as a result of an inline expansion of the entire TIF are described. PA1 Index mapping between the virtual array and an actual array where description is provided for the formula used to realize the virtual array from the actual array. PA1 Formula expressing the virtual array (source program information) This is information on the TIF formula in a source program. If inline expansion cannot be done, conversion is made based on this information to call a subroutine corresponding to the TIF. PA1 Determining the dimensions of a virtual array PA1 Making indices of the virtual array and an actual array correspond PA1 Normalizing of the indices PA1 1:100 for the first dimension PA1 1:10 for the second dimension
The first advantage enables a program to be described and understood more easily. The second advantage provides more opportunities for describing a program in parallel for parallel-processible computers such as vector and parallel computers.
General scalar processors which do not have parallel processing functions cannot execute a program in parallel. In some cases processing efficiency is low, with no program conversion for scalar processing. This conversion from a program written in array representation to a program which enables a scalar processor to work efficiently is called "scalarizing."
There are three methods of Scalarizing optimization:
In the inline expansion of intrinsic functions method, which is also carried out in existing compilers, the processing speed is increased by executing inline-expanded code instead of calling a subroutine.
In the reduction of the number of array temporaries method, scalarizing an array expression requires, an array temporary to store array values on the right hand side of an equation. The number of array temporaries is reduced by analyzing dependency and inverting direction of loops. An example is given below.
Original array expression: EQU A(10:90)=A(1:81) B(10:90) (Formula 3)
Code obtained by simple inline expansion:
______________________________________ (Formula 4) DO I=10,90 T(I) = A(I-9) - B(I) Substitution into an array temporary END DO DO I=10,90 A(I) = T(I) Substitution into a true array END DO Reduction of the number of array temporaries by loop inversion: (Formula 5) DO I=90,10,-1 (5) A(I) = A(I-9) - B(I) END DO ______________________________________
In the improved loop efficiency method, loop overhead is reduced by executing a plurality of array expressions by a single loop.
Of the above three optimization methods, inline expansion of intrinsic functions is most effective. In languages such as FORTRAN 90 that support array expressions, transformational intrinsic functions (TIFs), reduction intrinsic functions (RIFs), etc., must be inline-expanded in addition to ordinary intrinsic functions. TIFs do array transformations such as shifting, rotation, transposition, enlargement, merging, and packing. RIFs receive an array and return scalar values such as maximums, minimums, and counts.
An example of inline expansion for an intrinsic function is shown below.
In the program below, array B spreads three-dimensionally to generate a 10.times.100.times.200 array (Formula 6), array C spreads two-dimensionally to generate a three-dimensional array of the same size, then the two arrays thus generated are added together (Formula 7). Elements are added in the first dimension direction to generate an array of 100 .times.200, which is then substituted into array A.
To convert this program simply, three three-dimensional temporary arrays are needed. An optimally inline-expanded program includes no such temporary, which drastically reduces the amount of memory used. Since copying of a very large amount of data is eliminated, processing speed should, in turn, increase.
Source program:
______________________________________ (Formula 6) A(1:100,1:200) = SUM( SPREAD( B(1:10,1:100), 3, 200 ) + SPREAD( C(1:10,1:200), 2, 100 ), 1 ) ______________________________________
Program obtained by simple conversion:
______________________________________ (Formula 7) T1(1:10,1:100,1:200) = SPREAD( B(1:10,1:100), 3, 200 ) T2(1:10,1:100,1:200) = SPREAD( C(1:10,1:200), 2, 100 ) T3(1:10,1:100,1:200) = T1(1:10,1:100,1:200) + T2(1:10,1:100,1:200) A(1:10,1:100,1:200) = SUM( T3(1:10, 1:100, 1:200), 1) ______________________________________
Program obtained by optimum inline expansion:
______________________________________ (Formula 8) DO I=1,100 DO J=1,200 S = 0 DO K=1,10 S = S + B(K,I) + C(K,J) END DO A(I,J) = S END DO END DO ______________________________________
In the above program, SUM and SPREAD are intrinsic FORTRAN 90 functions and have the following meanings:
SPREAD has a format SPREAD(SOURCE, DIM, NCOPIES), and generates a dimension-spread array by spreading an array designated by SOURCE in the direction of a DIMth dimension by copying its elements NCOPIES times. In the above example, the equation T1(1:10,1:100,1:200)=SPREAD (B(1:10,1:100), 3, 300) generates the array T1(1:10,1:100,1:200) by spreading the array B(l:10,1:100) three-dimensionally and by copying its elements 200 times. As a result, T1(i,j,k) has the same values as B(i,j) over the ranges i=1 to 10, j=1 to 100, and k=1 to 200.
Similarly, the equation T2(1:10,1:100,1:200)=SPREAD(C(l:10,1:200), 2, 100) generates the array T2(1:10,1:100,1:200) by spreading the array C(1:10,1:200) two-dimensionally by copying its elements 100 times. T2(i,j,k) thus has the same values as C(i,k) over the ranges i=1 to 10, j=1 to 100, and k=1 to 200.
SUM uses a SUM (ARRAY, DIM, MASK) format and generates a dimension-reduced array by summing up, in the direction of a dimension indicated by DIM, elements of a target array indicated by ARRAY having corresponding elements of MASK that are true. DIM and MASK are options. If they are not specified, all elements are added together and a scalar value is returned. In the above example, A(1:100,1:200)=SUM(T3(l:10, 1:100, 1:200), 1) means summing the elements of the array T3(1:10, 1:100, 1:200) in the first dimension. As a result, values T3(1,i,j)+T3(2,i,j)+ . . . . +T3(10,i,j) are substituted into A(i,j) over the range of i=1 to 100, j=1 to 200.
Most conventional inline expansion methods are based on idiom recognition (pattern matching). In this scheme, matching templates and their expansion forms are prepared and a description that matches one of the templates is replaced with the corresponding expansion form. This pattern matching scheme has a problem in that inline expansion code cannot be generated for a description not registered as a pattern or a description that combines registered patterns. It is practically impossible to register, as patterns, all combinations of TIFs and operations on arrays.
Macro-expansion determines how to alleviate the above problem to some extent. According to this scheme, a matching template is described flexibly, and an expansion form can be designated by a program. Although this scheme is more powerful in describing patterns than simple pattern matching, it does not address the essential problem that all patterns must be prepared.
Published Unexamined Patent Publication No. 58-149569 discloses a parallelized processing scheme by using scalar expressions for arrays. Specifically, this publication discloses a scheme in which, in a compiler for generating an object program from a given source program for a vector processor having a plurality of parallel arithmetic units, loops each having a simple variable are classified by checking the busy state of a simple variable for each loop at the entrance and exit of the loop and processing for parallelizing the respective loops is done based on the classification thus obtained. However, this publication does not disclose anything about the expansion of array operations.