Generating computer code that is efficiently processed (i.e., “optimized”) is one of the most important goals in software design and execution. Computer code which performs the desired function accurately and reliably but too slowly (i.e., code which is not optimized) is often discarded or unused by the computer users.
As those of ordinary skill in the art are aware, most source code (i.e., that code which is a human readable form) is typically converted into object code, and thereafter an executable application by use of a compiler and a linker. The executable application is in a form and language that is machine readable (i.e., capable of being interpreted and executed by a computer). Other languages, such as Java available from Sun Microsystems, Inc. of California, USA, may be in source code form that is, on execution, transformed into a form understood by a computer system which then executes the transformed instructions. In any case, the source code, when transformed into a form capable of being understood and executed by a computer system, is frequently optimized. That is, a transformation is performed such that the instructions are performed more efficiently (i.e., optimized) and, hopefully, without any undue delay.
One common structure found in source code is a loop. Nested loops—a loop within another loop—are also common in the art. Loops are used to repeat one or more operations or instructions. For example, an array may be used to store the purchase price of individual articles (e.g., where the ith element in the array A is denoted, in Fortran, as A(i)—other similar notations are used in other languages generate a single instruction to add each of the purchase prices together (e.g., sum=A(1)+A(2)+ . . . +A(n)). This however would take the programmer some time to code and is not easily adapted to the situation where the computer programmer does not know, at development time, the number of articles in the array. That is, when the number of elements in the array can only be determined at run time (i.e., during execution). Accordingly, the loop was developed to repeat an operation (e.g., sum=sum+A(i))) where the induction variable, i, is changed for each iteration. Other forms of loops are known and are equally applicable. However, when the instructions of loop are transformed into machine readable code (e.g., executable code), the executed instructions may not be processed efficiently. For the example above, some computer systems may require that the processor fetch from memory, rather than from a register or cache memory, the various elements of the array “A”. Fetching data from memory requires the processor to wait while the data is retrieved. Also, while loops may be an efficient way to write certain repetitive source code operations, a loop does insert additional operations that would not be present if the repetitive operations were replicated. These additional operations (e.g., branching operations) are considered to be the loop “overhead”.
To address some of the inefficiencies in processing loops, various optimization techniques have been created and applied. For example, one optimization technique is to unroll portions of the loop (hereinafter “unrolling”), replicate the portions and then insert the replicated portions into the code (also known as “jamming”). Typically, when the unroll and jam loop transformation technique is applied to the outer loop of a nested loop pair, the outer loop's induction variable (e.g., “i”) is advanced only a few times (the number of times being governed by a parameter referred to as the unroll factor—UF) rather than completely during the unrolling portion of this optimization technique. During the jamming portion of this technique, the inner loop would be replicated “UF” times. Persons of ordinary skill in the art will appreciate that the replicated loop bodies are not identical but only similar. In the replicated loop bodies, portions of the loop bodies which use the induction of the outer loop will be advanced as required (e.g., if the loop body included reference to array element A(i), where “i” is the outer loop induction variable, a replicated loop body would include reference to the next required array element—A(i+1)). The unroll and jam technique effectively reorders the calculations being performed in the nested loop.
The “unroll and jam” technique does offer some advantages but also has some disadvantages.
One disadvantage of the unroll and jam technique is that residues are created. Residues form the portion of a loop that is would not be executed when the loop is unrolled by a fixed factor—the unroll factor. That is, since the controlling induction variable of the unrolled outer loop is advanced a fixed number of times in every iteration, if the upper bound does not divide evenly by the unroll factor (i.e., when there is a remainder or, the modulus of the upper bound of the outer loop induction variable “i” and the unroll factor is not zero), then code must be generated to address this remaining portion —the residue. Code generated to handle these residues may add overhead and inefficiencies that can result in performance degradation.
The unroll and jam technique, as a result of the creation of code to address the residue problem, introduces some significant disadvantages. Notable amongst these disadvantages is that the creation of the residue causes perfect triangular nested loops (i.e., nested loops where the inner loop induction variable—“j”—is bounded on the upper end by the value of the outer loop induction variable “i”) to no longer be “perfect”. As a result, other optimization techniques which are only applicable to perfect loop nests cannot be additionally applied. Therefore, using the unroll and jam technique eliminates use of many further optimization techniques.
Other optimization techniques known to those skilled in the art do not scale well. That is, the optimization techniques may provide some benefit when applied to a nested loop pair (i.e., only two dimensions). However, such techniques are not known to the inventors of the present invention to be applicable or easily applicable to nest loops of three or greater dimensions.
Accordingly, an optimization technique which addresses at least some of these shortcomings would be desirable.