1. Technical Field
The present invention relates to methods for optimizing computer code, and in particular, to methods for software pipelining nested loops.
2. Background Art
Loops are software structures that allow programmers to perform repeated operations using a single set of instructions. A typical source code loop begins with a loop instruction, e.g. a "Do", "While" or equivalent statement, followed by the set of instructions ("loop body") to be repeated. Arguments associated with the loop instruction control the repetition of the loop body. These arguments include a test for terminating the loop ("loop test"). The loop test is typically a logical function of a variable that is modified by the loop. It controls a branch instruction that either exits (terminates) the loop or returns to the first instruction of the loop body, depending on whether the test is true or false, respectively. In counted loops, the loop variable is an index that is incremented each time the instructions of the loop body are executed, and the loop test compares the index with a maximum value.
Loops are nested when the body of one loop (the "outer loop") includes another loop (the "inner loop"). Perfectly nested loops are those in which the outer loop includes no instructions but those of the inner loop. Imperfectly nested loops are those in which the outer loop includes instructions in addition to those of the inner loop. In either case, each time the outer loop is executed, the instructions that form its loop body, including the inner loop, are executed. That is, the inner loop is fully executed on each repetition of the outer loop. The number of times the inner loop is executed for each iteration of the outer loop is a function of the inner loop test and the loop variable tested.
Depending on how they are implemented, loops can have a significant impact on the performance of a program. For example, the loop test is a branch condition which, if mispredicted, requires the processor to flush the current instructions from its pipeline, retrieve instructions from the correct branch path, and load these instructions into the pipeline. Misprediction is likely in loops since the branch is taken on all but the final iteration of the loop, and history-based branch prediction algorithms will predict the branch taken on the final iteration. The resulting branch misprediction is repeated every time the loop is entered. For nested loops, the inner loop is entered on each iteration of the outer loop, and the performance hit attributable to mispredictions can be significant.
Program performance can also be degraded by the overhead necessary to set up and terminate each loop. For nested loops, this overhead is multiplied, since the cost is incurred each time the instructions of the outer loop are repeated. If the outer loop repeats 100 times, the overhead for the inner loop is incurred 100 times. The smaller the loop body is, relative to this overhead, the greater the efficiency cost of the loop.
A number of methods have been developed to improve the efficiency with which loops (nested or otherwise) are implemented. For example, software pipelining takes advantage of the fact that the loop body instructions are repeated on each iteration of the loop by implementing the instructions for different iterations of the loop in parallel. In a loop body of three instructions, the first instruction may operate on variables for the i.sup.th pass through the loop ("iteration"), while the second and third instructions are implemented with variables from the (i-1).sup.st and (i-2).sup.st iterations.
Under certain circumstances, the overhead cost of nested loops may be mitigated somewhat by "unrolling and jamming" the outer loop. Here, the instructions of the outer loop body for sequential iterations are combined for processing in a single iteration of a modified loop index. Each iteration of the outer loop then executes instructions for multiple, sequential values of the modified loop index, including the inner loop instructions. In addition, the outer loop instructions may be rearranged within the expanded loop body, instruction dependencies permitting, to further streamline execution of the loop.
These methods, where applicable, increase the size of the loop body. The size of the loop body determines the number of instructions (scope) that a compiler can consider simultaneously, for implementing an optimization process. To the extent that these techniques increase the number of instructions in the loop body, they may enable additional compiler optimizations.
Despite their potential advantages, the above described techniques for handling loops are typically limited. For example, loop overhead is only reduced to the extent an outer loop can be unrolled, and this may be limited by dependencies between the inner and outer loop instructions. In addition, it is often practical to implement loop unrolling and similar techniques for only the two inner most loops of a set of nested loops. Some of these limitations are not present in perfectly nested loops, but imperfectly nested loops are very common and subject to most of these limitations.