This invention relates to the field of computer software optimization. More particularly, the present invention relates to a method of constructing and unrolling speculatively counted loops. BACKGROUND OF THE INVENTION
Computer programs are generally created as source code using high-level languages such as C, C++, Java, FORTRAN, or PASCAL. However, computers are not able to directly understand such languages and the computer programs have to be translated or compiled into a machine language that a computer can understand. The step of translating or compiling source code into object code process is performed by a compiler. Optimizations are mechanisms that provide the compiler with equivalent ways of generating code. Even though optimizations are not necessary in order. for a compiler to generate code correctly, object code may be optimized to execute faster than code generated by straight forward compiling algorithm if code improving transformations are used during code compilation. Loop unrolling is one such optimization that can be used in a compiler.
For the purpose of loop unrolling, loops are categorized as follows. A loop is counted if the number of iterations that the loop will execute is determined once execution reaches the loop. Counted loops are also referred to as xe2x80x9cforxe2x80x9d loops. Conversely, a loop has data dependent exit, loosely called a xe2x80x9cwhilexe2x80x9d loop, if the number of iterations is determined during the execution of the loop. Counted loops are further classified. If the compiler can determine the number of iterations that the loop will execute at compile time, then the number of iterations is a compile time constant. Otherwise, the number of loop iterations is variable.
A compiler can optimize counted loops better than xe2x80x9cwhilexe2x80x9d loops. Applying the counted loop optimization to a xe2x80x9cwhilexe2x80x9d loop will cause the compiler to generate incorrect code. Therefore, in order to ensure the generation of correct code, the compiler""s default assumption must be that all loops are xe2x80x9cwhilexe2x80x9d loops. Then the compiler may later try to prove that a loop is a counted loop so that more optimizations become possible. Similarly, the compiler can optimize a compile time constant counted loop to execute more efficiently than a variable counted loop. Furthermore, applying compile time constant loop optimizations to a variable counted loop will generate incorrect code. The compiler""s default assumption has to be that all loops are variable, and only if the compiler succeeds in proving that a counted loop is a compile time constant counted loop, can the compiler proceed to apply further optimizations.
For example, here are two possible optimizations that a compiler can apply only to compile time constant counted loops. In one possible optimization, the compiler may unroll the loop entirely. Typically, compilers will unroll a loop entirely if the trip count is determined to be small, e.g. eight or less. A second optimization that a compiler may apply to loops with large compile time constant trip counts is to chose an unrolling factor that divide the trip count evenly. If the loop is variable or if such an optimal factor can not be found (e.g. if the trip count is a large prime number), then a cleanup loop must be generated after the unrolled loop to execute the remainder of the iterations.
Compilers can also optimize counted loop to execute more efficiently than xe2x80x9cwhilexe2x80x9d loops. In the context of loop unrolling, when the compiler unrolls a xe2x80x9cwhilexe2x80x9d loop, the compiler has to simply copy the whole loop as many times as given by the unrolling factor chosen. This copy step includes the loop overhead. To illustrate, consider the following scheme:
LOOP:
BODY(I)
I=SOME_NEW_VALUE(I)
If (CONDITION(I))
GOTO LOOP
ELSE
GOTO LOOP_END
LOOP_END:
BODY(I) is the useful part of the loop that does the real work in an iterative way. CONDITION is some test statement involving a variable xe2x80x9cIxe2x80x9d that changes in at least some of the loop iterations and that determines whether the loop terminates or continues execution. Unrolling this xe2x80x9cwhilexe2x80x9d loop by an unrolling factor of three yields the following construct:
LOOP:
BODY(I)
I=SOME_NEW_VALUE(I)
If (NOT CONDITION(I))
GOTO LOOP_EXIT
BODY(I)
I=SOME_NEW_VALUE(I)
If (NOT CONDITION(I))
GOTO LOOP_EXIT
BODY(I)
I=SOME_NEW_VALUE(I)
IF (CONDITION(I))
GOTO LOOP
ELSE
GOTO LOOP_EXIT
LOOP_EXIT:
When a compiler unrolls a counted loop, the compiler can save the loop overhead. The compiler can generate loop overhead code only once in each new iteration that corresponds to several original iterations. Consider the following counted loop construct:
I=0;
N=some_unknown_value;
LOOP:
BODY(I)
I=I+1
If (I less than N)
GOTO LOOP
ELSE
GOTO LOOP_EXIT
LOOP_EXIT
Assume that the compiler decided to unroll this loop by an unrolling factor of three. The compiler has to generate code that will verify, at execution time, that the loop is about to execute at least three iterations. Also, the upper bound in the unrolled loop must now be reduced to Nxe2x88x922, and a cleanup loop must be generated to execute the remainder of the iterations. The resulting code will look like:
I=0
N=some_unknown_value
If (N less than 3) GOTO IN_BETWEEN
LOOP:
BODY(I)
BODY(I+1)
BODY(I+2)
If (I less than Nxe2x88x922) GOTO LOOP
ELSE GOTO IN_BETWEEN
IN_BETWEEN:
IF (I greater than =N) GOTO LOOP_EXIT
CLEANUP:
BODY(I)
I=I+1
IF(I less than N)
GOTO CLEANUP
ELSE
GOTO LOOP_EXIT
LOOP_EXIT
If the value of xe2x80x98Nxe2x80x99 is large enough, most of the execution time will be spent in the unrolled loop. The added control around the loop has a negligible effect on performance. Significant performance is gained from not having to execute the loop overhead. Hence the compiler""s ability to prove that a given loop is counted is a key in achieving this performance gain.
A method of constructing and unrolling speculatively counted loops is described. The method of the present invention first locates a memory load instruction within the loop body of a loop. An advance load instruction is inserted into the preheader of the: loop. The memory load instruction is replaced with an advanced load check instruction. The loop body is unrolled. A cleanup block is generated for said loop.
Other features and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description that follow below.