Loop interchange is a program transformation that re-orders loop inclusions or "nests". A loop interchange is illustrated by the following example:
______________________________________ Before After ______________________________________ a) DO I = 0,9 DO J = 0,9 b) DO J = 0,9 =&gt; DO I = 0,9 c) A(I,J) = 0 A(I,J) = 0 Example 1 ______________________________________
To paraphrase the above steps in the "before" code: step a) indicates that the value of I should be calculated for each of the values stored in an I.times.J memory matrix at I positions 0-9; step b) indicates that the value of J should be calculated, for every value of I, for each of the values stored in the I.times.J memory matrix at J positions 0-9; and step c) indicates that A(I,J) should be initialized to zero at the start. One skilled in the art will realize that an interchange of the loops of Example 1 may help performance, since the memory accesses will become more sequential.
This can be understood by reference to FIG. 1, wherein a 10.times.10 memory matrix is schematically illustrated. If the above-noted Before code listing is executed, the following I,J addresses are accessed in sequence: 0,0; 0,1; 0,2; 0,3 . . . Note that this addressing requires memory positions 0, 10, 20, 30, etc to be sequentially accessed, requiring multi-memory position increment for each address action.
By performing a loop interchange between the I loop and the J loop to reach the "After" code listing in Example 1, the resulting JxI addressing accesses memory positions 0-30 . . . in sequence, thus achieving a single step or "stride one" incrementing of address positions (i.e., a more efficient and faster executing procedure).
A loop interchange requires perfect loop nests, that is, no code is present within the body of the outer loop other than one or more inner loop(s). A set of loops which comprise other than a perfect loop nest is referred to as an imperfect loop nest. One strategy for interchanging imperfect loop nests is first to apply an "enabling" transformation, such as loop distribution, which takes an outer loop that surrounds multiple components, and converts the outer loop into multiple loops around each individual component. For example:
______________________________________ Before After ______________________________________ DO I = 0,9 DO I = 0,9 X(I) = 3 =&gt; X(I) = 3 DO J = 0,9 END DO A(I,J) = 0 DO I = 0,9 END DO DO J = 0,9 END DO A(I,J) = 0 Example 2 ______________________________________
The loop distribution illustrated in the "After" code of Example 2 splits the DO I outer loop into two loops and allows isolation of the imperfection (i.e., X(I)=3). Thereafter, the second loop nest can be interchanged, as shown in Example 1. Further examples of both loop interchange and loop distribution can be found in Zima and Chapman, Supercompilers for Parallel and Vector Computers, ACM Press, 1991.
The practical difficulties of such program manipulations are: (i) neither a loop interchange nor a loop distribution are always legal, and (ii) loop distribution, by itself, may be a detrimental transformation, as it raises loop overhead, and more importantly, may lower memory reuse. Accordingly, an algorithm is needed that achieves the distribution-enabled interchanges as above, only when legal, and avoids detrimental distributions when no subsequent interchange is enabled.
For simple cases, such manipulations have been achieved, essentially by speculatively distributing loops; seeing if profitable loop interchanges result; and if not, abandoning the distribution and restoring the original code. See: McKinley et al., "Improving Data Locality with Loop Transformations", ACM Transactions on Programming Languages and Systems, July 1996, Vol. 18, Num. 4.
To summarize, the prior art recognizes that loop interchanges, standing alone may improve program performance by improving a sequence of memory accesses. Further, it is known that by starting with a loop distribution, that a subsequent loop interchange may be enabled which will improve program execution. However, to Applicant's knowledge, it has not been realized that where a potential loop distribution is illegal, that an enabling loop interchange may open a route for further program code improvements.