1. Technical Field
The present invention relates to mechanisms for optimizing computer code, and in particular, to mechanisms for improving the performance of software-pipelined loops.
2. Background Art
Software pipelining is a method for scheduling non-dependent instructions from different logical iterations of a program loop to execute concurrently. Overlapping instructions from different logical iterations of the loop increases the amount of instruction level parallelism (ILP) in the program code. Code having high levels of ILP uses the execution resources available on modern, superscalar processors more effectively.
A loop is software-pipelined by organizing the instructions of the loop body into stages of one or more instructions each. These stages form a software-pipeline having a pipeline depth equal to the number of stages (the “stage count” or “SC”) of the loop body. The instructions for a given loop iteration enter the software-pipeline stage by stage, on successive initiation intervals (II), and new loop iterations begin on successive initiation intervals until all iterations of the loop have been started. Each loop iteration is thus processed in stages through the software-pipeline in much the same way that an instruction is processed in stages through a processor pipeline. When the software-pipeline is full, stages from SC sequential loop iterations are in process concurrently, and one loop iteration completes every initiation interval.
Various methods for implementing software-pipelined loops are discussed, for example, in B. R. Rau, M. S. Schlansker, P. P. Tirumalai, Code Generation Schema for Modulo Scheduled Loops IEEE MICRO Conference 1992 (Portland, Oreg.) and in, B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker, Register Allocation for Software-pipelined Loops, Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, (San Francisco, 1992).
The initiation interval (II) represents the number of processor clock cycles (“cycles”) between the start of successive iterations in a software-pipelined loop. The minimum II for a loop is the larger of a resource II (RSII) and a recurrence II (RCII) for the loop. The RSII is determined by the availability of execution units for the different instructions of the loop. For example, a loop that includes three integer instructions has a RSII of at least two cycles on a processor that provides only two integer execution units. The RCII reflects cross-iteration or loop-carried dependencies among the instructions of the loop and their execution latencies. If the three integer instructions of the above-example have one cycle latencies and depend on each other as follows, inst1→inst2→inst3→inst1, the RCII is at least three cycles.
A software-pipelined loop has its maximum ILP when its RCII is less than or equal to its RSII. Various optimization techniques may be applied to the loop to reduce its RCII. The efficacy of these optimizations may be greatly enhanced by allowing instructions to be executed speculatively. An instruction is executed speculatively (“speculated”) if it is executed before the processor determines that the instruction needs to be executed. In software-pipelined loops, instructions from multiple loop iterations execute in parallel. Instructions from later iterations that are executed concurrently with instructions from a current iteration may be speculated. That is, their execution may be unnecessary if the loop terminates with the current iteration.
One problem created by allowing speculative execution within a software-pipelined “while” loop is that a speculatively executed instruction may modify (“clobber”) values that are provided as input to the loop (“live-in values”) before the input values are used. This happens in “while” loops because the loop condition is determined by instructions within the loop body, and cannot be used to activate (or gate) a speculatively executed instruction in the prolog phase. If the speculative instruction is in the first stage of the software pipeline, no problem arises because the instruction executes as soon as the loop begins. However, if the speculatively executed instruction is scheduled in a later stage of the software pipeline, the loop control mechanism provides no simple way to gate execution of the speculative instruction at the appropriate time. Data corruption can result if the speculated instruction executes prematurely and over-writes a live-in value before the value is used.
One way to implement speculative execution within a software pipeline without clobbering a live-in value is through an explicit prolog. In an explicit prolog, some or all of the instructions that would otherwise execute during the prolog phase of the software-pipelined loop are scheduled for execution before the loop begins. The loop is then initiated at a later stage of the prolog phase or with the software pipeline full, i.e. at the start of the kernel phase. This eliminates the risk of clobbering live-in values with speculatively executed instructions, but it also expands the size of a code segment because instructions of the prolog are duplicated (they appear in the loop body and one or more times in the explicit prolog). The resulting code expansion can be significant, particularly if the loop body includes a large number of instructions and multiple prolog stages need to be explicitly coded.
The present invention addresses these and other problems associated with software pipelining of loops.