As is known to those skilled in the art, there are a number of techniques employed by compiler programs to improve the execution efficiency of program sequences. A description of some of these procedures can be found in chapter 10 of "Compilers", Aho et al., Addison Wesley (1986). Principal methods for improvement of execution efficiency of program routines involve actions with respect to loops that are present therein. Hereafter, some background will be presented with respect to loop operations.
A loop refers to a repeated execution of a sequence of one or more computer instructions. The sequence of instructions is called the loop body. The number of times the loop body is to be repeated is called the number of iterations. Most programming languages provide one or more form of loop constructs. An example of a FORTRAN DO-Loop is shown below: ##EQU1## where: (a)=Do-variable
(b)=Initial expression PA1 (c)=Final expression PA1 (d)=Increment expression PA1 1. It enables the exploitation of memory reuse across loops, which can have significant positive impact on the program's run time performance because it can reduce, even eliminate, cache and Translation Look-aside Buffer misses by bringing close together (in time) multiple accesses to the same or nearby memory locations. PA1 2. It reduces the overhead for the run time execution of the loop by reducing the number of termination tests and branching instructions needed to restart the iterations required for the execution of each of the original loops down to that required for the single fused loop. PA1 1. The candidate loops must be adjacent to each other in the program, i.e. there must be no other statements between the two loops. PA1 2. The candidate loops must have identical numbers of iterations. PA1 3. The fused loop must perform the same computation as the candidate loops.
A DO-Loop Contains Three Parts
1. A DO statement which signifies the beginning of a DO-Loop construct and contains a DO-variable, an initial expression, a final expression, and an increment expression. Before the loop body is executed, the DO-variable is assigned the value of the initial expression. At the end of each successive iteration of the loop body, the DO-variable is incremented by the value of the increment expression and its value is compared against the final expression. If the value of the DO-variable is less than or equal to that of the final expression, then the loop body is executed; otherwise the iteration is terminated.
In the FORTRAN example above, the number of iterations is 3 because at the end of the third iteration the value of the DO-variable becomes 4 which is greater than that of the final expression. When the value of the increment is 1 it can be omitted from the DO-statement.
2. A loop body which contains one of more statements to be repeatedly executed on each iteration of the loop.
3. An ENDDO statement which signifies the end of a DO-Loop construct.
The loop body often contains a computation action which involves an array. An array is a group of consecutive memory locations that are referenced by the same name. Each individual memory location is referred to as an element of the array and is specified by a name plus a subscript expression inside a pair of parentheses.
In the example above, an array having the name "A" is used. The notation A(I) refers to the Ith element of the array. For instance, when the value of I is 1, A(I) is the same as A(l) and both refer to the first element of array A. Therefore the computation performed by the DO-Loop is to assign the current value of the DO-variable I to the Ith element of array A. After the DO-Loop is executed the first three elements of array A contain the values 1, 2, 3 respectively: ##EQU2##
The notation (1, 2, 3) is used to indicate the values of array elements A(1), A(2), and A(3), in that order.
Loop peeling is a procedure by which a first or final iteration is removed from a loop and the number of iterations of the loop is appropriately reduced by adjusting the value of the initial or the final expression, depending on whether the first or last iteration is removed. An example of loop peeling is given below where the first iteration is removed from the loop; the initial expression is changed from 1 to 2; and the number of iterations of the loop is reduced from 3 down to 2. ##STR1##
It is also possible to remove more than 1 iteration from either or both ends of a loop.
Loop reversal is a further action that is used to improve loop execution and is the transformation by which the direction in which loop iterations are performed is reversed. An example of loop reversal is shown below:
______________________________________ DO I = 1, 3 DO I = 1, 3 A(I) = I --&gt; A(4-I) = 4-I ENDDO ENDDO ______________________________________
To illustrate the effect of a loop reversal, the example below shows the computations performed by a loop before and after loop reversal, during each iteration of the loop, respectively:
______________________________________ Original Loop Reversed Loop ______________________________________ Iter 1: A(1) = 1 A(3) = 3 Iter 2: A(2) = 2 A(2) = 2 Iter 3: A(3) = 3 A(1) = 1 ______________________________________
It can be seen that the order in which the elements of array A are computed is reversed by the reversed loop and yet the end results are the same, i.e. array A is assigned values (1, 2, 3).
Loop Fusion is a combination of two or more adjacent loops, both with the same number of iterations, into a single loop (L1 and L2 are labels used to identify the loops). For example:
______________________________________ L1: DO I = 1, 3 L3: DO I = 1, 3 A(I) = I A(I) = I ENDDO --&gt; B(I) = A(I) L2: DO I = 1, 3 ENDDO B(I) = A(I) ENDDO ______________________________________
After the loop fusion transformation shown above, the same computations performed by the two fusion candidate loops L1 and L2 are performed by the single fused loop L3, i.e. arrays A and B are assigned values (1, 2, 3).
Loop fusion has potential benefits in reducing the runtime of a program. The most important ones are:
A loop fusion can only "legally" be performed under the following conditions:
The procedure employed to test for the legality of a loop fusion operation is often called "data dependence analysis". See Zima et al., "Supercompilers for Parallel and Vector Computers" Addison Wesley, 1991. "Reuse" is a further test that is used to test whether a revised code sequence operates more efficiently (or "profitably") than the unrevised code sequence.
Recall that loop fusion can only be performed when the candidate loops have the same number of iterations and that the fused loop must perform equivalent computations as that performed by the individual candidate loops. To enable fusion of loops, loop peeling on one of the loops has been used when candidate loops do not have the same number of iterations. Consider the following two loops L1 and L2 below. They cannot be directly fused because L1 contains one more iteration than L2 does:
______________________________________ L1: DO I = 1, 3 L2: DO J = 2,3 A(I) = I B(J) = A(J) ENDDO ENDDO ______________________________________
However, if the first iteration of L1 is peeled from the loop, then the remainder of Loop L1 (i.e., L1') and Loop L2 can be fused to form loop L3 as shown below:
______________________________________ Peel: A(1) = 1 A(1) = 1 L1': DO I = 2, 3 L3: DO I = 2, 3 A(I) = I A(I) = I ENDDO --&gt; B(I) = A(I) L2: DO J = 2, 3 ENDDO B(I) = A(I) ENDDO ______________________________________
The problem in practice is that neither loop peeling nor loop reversal is guaranteed to enable legal and profitable loop fusion. Furthermore, loop peeling or loop reversal by themselves may cause loop performance deterioration, as the former increases the size of the program and the latter changes the data dependencies of the loop, which may disable other profitable transformations. The prior art does teach the determination of the legality and profitability of loop fusion. See Carr et al., "Compiler Optimizations for Improving Data Locality", Proc. of ASPLOS VI, San Jose, Calif., October, 1994. Also the prior art has used loop reversal to enable loop permutation, but the use of loop reversal to enable legal and profitable loop fusion is, to Applicants' knowledge, not known to be in the prior art.
Notwithstanding the use of loop peeling to enable subsequent loop fusion, it often occurs that a data dependency test on a pair of loops will indicate a resultant illegality if the loops are fused. In such case, a fusion action on the loops is inhibited. Nevertheless, the important operational efficiencies which can be achieved through loop fusion are sufficiently important to justify explorations of other loop manipulations to enable subsequent fusion actions. Accordingly, there is a need for an improved compiler method for the generation of fused loop sequences.