1. Field of the Invention
The present invention relates in general to software optimization that may be used by a programmer or an automatic program optimizer, and more particularly to a technique for generating compact code for the unrolling transformation applied to a perfect nest of loops.
2. Description of the Related Art
Loop unrolling is a well known program transformation used by programmers and program optimizers to improve the instruction-level parallelism and register locality and to decrease the branching overhead of program loops. These benefits arise because creating multiple copies of the loop body provides more opportunities for program optimization. Most optimizing compilers employ the loop unrolling transformation to some degree. In addition, many software packages, especially those for matrix computations, contain library routines in which loops have been hand-unrolled for improved performance.
Historically, the unrolling transformation has been defined for a single loop. More recently, it has been observed that further benefits may be obtained by unrolling multiple nested loops and fusing or jamming together unrolled copies of inner loops. This "unroll-and jam" transformation has a multiplicative effect on the unroll factors of the individual loops, such that the number of copies in the unrolled loop body equals the product of the unroll factors of the individual loops.
A general concern with the aggressive use of unrolling is with the size of the code generated after the loop unrolling transformation is performed, especially when unrolling multiple perfectly nested loops. Apart from creating a larger unrolled loop body, additional loops have to be introduced to correctly handle cases where the unroll factor does not evenly divide the number of iterations. These "remainder" loops substantially increase the compile-time for the transformed code and the size of the final object code, even though only a small fraction of the program's execution time is spent in these remainder loops.
To quantitatively understand the substantial code size increase that may result from applying the unroll-and-jam transformation to multiple loops, let UF[1], . . . , UF[k] be the unroll factors for a perfect nest of k loops numbered from outermost to innermost. After the conventional unroll-and-jam transformation is applied, the unrolled loop will contain .vertline.UF[1]+mod(1,UF[1]).vertline..times..vertline.UF[2]+mod(1,UF[2]). vertline..times. . . . .times..vertline.UF[k]+mod(1,UF[k]).vertline.copies of the loop body for the general and common case of loop bounds that are unknown at compile-time, where "mod(1, UF[i])" is a function that equals zero when UF[i]=1 and where the function equals one when UF[i]&gt;1. For example, if all k loops have the same unroll factor m where UF[1]=UF[2]= . . . =UF[k]=m&gt;1, then the conventional unroll-and-jam transformation will generate C'=(m+1).sup.k copies of the loop body in the unrolled code. More specifically, if the conventional unroll-and-jam transformation is applied to the example code of Table A with m=4 and k=3, then 125 copies will be generated, C'=125=(4+1).sup.3. This generated code is shown in Table C. Note that only 64 copies are in the main unrolled loop body, and that the remaining 65 copies are lower execution frequency remainder loops. Thus, this example clearly illustrates the high proportion of code comprising lower execution frequency remainder loops, as in this example, the prior art generates a greater amount of code in lower execution frequency remainder loops than it does in the higher execution frequency main unrolled loop body.
Thus, conventional techniques of loop unrolling in generating remainder loops substantially increase the compile-time, the size of the object code, and the proportion of the object code comprising lower execution frequency remainder loops. As such, there is a need for a method of, and apparatus for, and article of manufacture for generating smaller more compact code for the unrolling transformation of multiple nested loops.