Typically, compiler strength reduction algorithms focus on integer address computations within loops that can be expressed as a simple linear function of a loop induction variable.
Prior approaches to strength reduction have generally adopted one of the following techniques:
a. Simple bit-vector based data-flow analysis to identify induction variables and region constants within loops, followed by identification of simple inductive multiplications within loops and replacement of those multiplications with temporaries that are updated within loop bodies in parallel with updates to the induction variable. See, Aho, et al. "Induction Variables and Reduction in Strength".
The drawbacks of this approach are that:
the bit-vector data-flow analysis precludes deep analysis of region constants used in strength reduction candidate expressions PA1 profitability issues are not considered PA1 reassociation opportunities are not exploited PA1 machine-specific issues (specifically, predication and segmented address computations) are not considered. PA1 the bit-vector data-flow analysis precludes deep analysis of region constants used in strength reduction candidate expressions PA1 profitability in terms of optimal placement of temporary updates is not considered PA1 strength reduction and reassociation are not applied to expressions that do not compute an address value PA1 machine-specific issues (specifically, predication and segmented address computations) are not considered. PA1 profitability in terms of optimal placement of temporary updates, net path-length reduction and register pressure are not considered PA1 machine-specific issues (specifically, predication and segmented address computations) are not considered. PA1 only computations that are involved in computing address values are strength reduced. PA1 reassociation opportunities are not exploited PA1 machine-specific issues (specifically, predication and segmented address computations) are not considered.
b. Simple bit-vector based data-flow analysis to identify induction variables and region constants within loops, followed by symbolic analysis of strength reduction candidate address expressions within loops and replacement of those expressions with temporaries that are updated within loop bodies in parallel with updates to the induction variable. Vatsa Santhanam, HP Journal 1992, "Register Reassociation in PA-RISC Compilers".
The drawbacks of this approach are that:
c. SSA-based data-flow analysis to identify induction variables and region constants within loops, followed by symbolic analysis of strength reduction candidate address expressions within loops and replacement of those expressions with temporaries that are updated within loop bodies in parallel with updates to the induction variable. P. Markstein, et al., December 1992 "Strength Reduction chapter of an unpublished text book on program optimization".
The drawbacks of this approach are that:
d. A PRE-based data-flow analysis of arbitrary control-flow structures to identify multiplications that can be reduced in strength by replacement with temporaries that are updated at optimal points in the flow-graph to preserve the multiplication result.
The drawbacks of this approach are that:
Often the strength reduction algorithm, and notably the reassociation transformation, is applied only to integer expressions that compute the address used by a memory access instruction. Moreover, strength reduction algorithms typically do not factor in the profitability of the code transformations in terms of register pressure impact and net path-length reduction within loop bodies.
An architecture that supports "Predication" is Ron Cytron, et al., entitled "Efficiently Computing Static Single Assignment Form and the Control Dependence Graph", poses some new challenges with regard to identifying and exploiting strength reduction opportunities within loop bodies. In particular, machine instructions that either update loop induction variables or are themselves candidates for strength reduction may be guarded by predicate registers. This poses a complication for strength reduction algorithms. In addition, on a 64-bit segmented architecture, the result of 32-bit address computations need to be converted into a 64-bit address value through an explicit conversion. Strength reduction of such address conversion operations can be quite important to achieve high performance within integer loops, but requires careful analysis to avoid compromising correctness--particularly in the presence of pointer values that may refer to different "segments".
FIG. 1 illustrates at a very high level the strength reduction transformation. The strength reduction transformation is an algorithm that is typically implemented in an optimizing compiler that is focused on code that executes within loops. The basic idea behind the strength reduction algorithm is to look for expensive multiply operations within the loop involving a pair of terms, wherein one of which does not vary during the course of the loop's execution, and the other of which, happens to be a variable that progresses through a linear sequence of values as the loop iterates. The goal of strength reduction is to identify such multiplications and replace them by cheaper operations, namely additions. This will allow for the loop to execute faster on processors, because multiply operations are generally more expensive than addition operations.
FIG. 1A depicts a code fragment of a high-level C source program 10. There is a variable called g 11 of type "int". It is an integer typed variable and is global in scope, in that it is declared outside the function main. Main 12 is a procedure function that is the routine where a program execution typically starts in a C program. Inside of this function main, there is a local variable i 13 declared to be an integer, and a loop 14 whose execution is controlled by the variable i, whereby i is assigned the value 0 by the for statement before entering the loop, and it is incremented by 1 each time through the loop, as indicated by the i++ expression at the end of that for statement. At the end of each iteration of the loop, i is checked to determine whether i is less than 10, as shown by the condition between the two semi-colons in the for statement. If that condition happens to be true, then execution of the loop is continued.
In other words, this for loop iterates ten times, where i assumes the value 0-9 in steps of 1, and when i reaches 10, then execution of the loop is terminated and the code that follows the for loop is executed. Now inside the for loop, there is an assignment to the global variable g of the expression 20*i+5 15. Thus, on the first iteration when i is 0, g would get the value 5, and the next iteration 25, and so forth. Therefore, there is an expensive multiply operation namely, 20*i; the 20 is a value that remains unchanged through the course of the loop's execution, and i is a variable that is incremented in steps of 1.
So there is strength reduction opportunity in this code fragment 10, and FIG. 1B illustrates how the strength reduction transformation, that is the compiler algorithm that performs this transformation, would effectively optimize this program. Specifically, the for loop is strength reduced so that the multiply operation is transformed into a loop that involves only additions.
FIG. 1B shows that the compiler effectively has created a new variable, a temporary variable called t 16 of type integer, and assigned it the value 5 outside of this loop. Thus, the temporary variable is initialized to 5, and the assignment to g that involved the multiplication of 20 with i and the addition of 5, has just been replaced with a simple assignment to g of t 17. The compiler has also generated code to increment the temporary variable t at the end of the for loop by 20 18. Thus, there is no multiplication inside the loop body, instead, there is an addition, namely t is equal to t+20. Therefore, the variable g will be assigned the same values as in the original program, that is, in the first iteration g gets the value 5, and in the second iteration it will get 25, and so on and so forth. It will progress on until in the final iteration where g will have the value 185, and the " . . . g . . . " notation after the for loop indicates that there is a use of g outside the loop that would receive the value 185 in both cases.
FIG. 2 illustrates the general structure of a typical compilation environment 20 wherein there are multiple source files 21 that comprise a program written by some user in some high level language. Each file is processed by the compiler 22 into an object file 23, which typically consists of a sequence of machine instructions that are the result of translating the high level source statements in the source file. These object files are then processed by a linker program 24, which combines the different object files and produces an executable program 26. The executable program is then eligible for direct execution on a computer 27. Thus, the program reads some input 25 and does some processing on it and generates some output. The strength reduction algorithm is typically implemented as part of the compiler program shown in FIG. 2.
FIG. 3 depicts a view of the internal structure of optimizing version of the compiler 22 of FIG. 2. This type of compiler not only translates source files into object files 23, but also attempts to improve the run time performance of the object file created. The compiler begins with the source file 21. The source file is read in and checked for syntax errors or semantic errors by a component of the compiler known as the front end 31. And assuming that there are no errors, the compilation proceeds with the front end 31 generating a high level intermediate representation 32. This representation is an internal abbreviated description of the source file that in many optimizing compilers, is digested optionally by a high level optimizer 33 that attempts to improve the structure of that high level intermediate representation and thereby increase run-time performance.
The high level optimizer performs transformations that would allow the code to be executed faster when the code is subsequently processed by other down stream phases of the compiler. The high level intermediate representation is converted into a low level intermediate representation 35 that is much closer to the sequence of instructions that a computer would actually be able to execute. The conversion process is carried out by a code generator 34. The low level intermediate representation is optionally subject to optimization by a low level optimizer 36 and once that is done, the final step involves generating the object file 23, which is typically done by the very back end of the compiler or object file generator 37.
FIG. 4 depicts the internal structure of a typical low level optimizer 36 that was shown in FIG. 3. The optimizer begins with the unoptimized low level intermediate representation 35 for each procedure being compiled and generates the optimized low level intermediate representation 40 for each procedure. The main phases that comprise the low level optimizer are shown in FIG. 4. The first phase is the control flow analysis phase 41, and the task of this phase is to create a graph representation of the low level intermediate representation, where the nodes of the graph are referred to as basic blocks. These blocks are sequences of instructions or low level intermediate operations that are to be executed without a change in the control flow. The edges of the control flow graph would correspond to possible transfers of control between the nodes, depending on conditional checks. For instance, an if statement in the program would correspond to a basic block that is terminated with control flow edges to the then clause and the else clause.
The second phase is the local optimization phase 42, which focuses on the individual nodes of the control flow graph, that is the individual basic blocks or stretches of instructions without intervening breaks in control flow. This is the scope of the local optimizer and the kinds of code transformations that are performed are simple things like recognizing duplicate computations and eliminating such redundancies. Another example is constant propagation. Operations involving register values are replaced, where possible, with operations involving constant values.
The third phase is interval analysis 43, where the task is to recognize the loop structure of the procedure that is being compiled. For example, in FIG. 1A, there was a loop 14 in the program, and the interval analysis phase 43 would recognize that there is a repetitive control flow structure that may involve multiple basic blocks, which constitute a loop. This is discussed further with regards to the description of FIG. 6.
The fourth phase is global data flow analysis 44, which determines how data values that are computed in the different basic blocks flow from one basic block to the other when a procedure in a global sense. So for instance, if the value 10 is computed and assigned to a variable i in basic block 1, the analysis determines whether the value assigned to the variable i propagates onto other basic blocks downstream from basic block 1 in the control flow graph. This phase also determines how data is transmitted through the different edges in the control flow graph. That is critical for many global optimization phases 45. Most global optimization phases rely on preceding global data flow analysis, and in the global optimization phase several classic optimizations take place, such as global constant propagation or global common sub-expression elimination.
The sixth phase is the loop optimizations 46, which is described in FIG. 5. As the name implies, these optimizations are focused on improving the performance of loops. That is, instructions that are found within loop structures identified by the earlier interval analysis phase are transformed by the phase.
The remaining phases illustrated in FIG. 4 are typical of most modern optimizing compilers. Next is a global instruction scheduling phase 47 that reorders instructions to improve hardware pipeline efficiency as the program executes on the hardware. After that is a register allocation phase 48 that assigns physical registers to different virtual register resources in the program. For instance, if there is a variable i that is declared in a source level program, the register allocator may decide to maintain the variable i in register R20 and it may decide to maintain another variable j in register R15. These are the typical kinds of decisions made by the register allocator. Register allocation is a necessary step in order to get a functional object file. Finally, the post register allocation optimization phase 49 includes sundry things like peephole optimizations and local instruction scheduling.
Thus, one of the classic optimization phases in a low level optimizer is a loop optimization phase that incorporates a strength reduction algorithm. FIG. 5 depicts the loop optimization phase 46 of FIG. 4 and illustrates the typical subphases of the loop optimization phase. This phase begins with unoptimized intermediate code 50 and performs region constant identification 51, followed by loop invariant code motion 52, induction variable analysis 53, strength reduction candidate identification 54, strength reduction application 55, linear function test replacement 56, and finally, dead code elimination 57 to produce optimized intermediate code 58.
Steps 51-56, are repeated for each loop that occurs in the source procedure, where each loop is considered in an order that reflects the reverse of the loop nesting order. In other words, the inner nested loops are optimized before the outer loops. The reason for this progression is to expose any code that migrates out of inner loops to outer loops for additional optimization opportunities in outer loops.
FIG. 6 depicts a portion of the control flow graph 60 that corresponds to a loop. The rectangles or squares denote the basic blocks 61, which are sequences of operations without breaks of control flow. Instructions in a basic block typically are executed sequentially, one after the other without any interruption. These basic blocks are a compiler internal data structure to help analyze the code stream, and the hardware does not perceive their existence. The edges 62 between the basic blocks represent possible transfers of control flow between different basic blocks.
Basic block B1 of FIG. 6 has a sequence of instructions that is terminated by an if check 63 that may correspond directly to a source level if statement 64. And depending on the outcome of that conditional evaluation, i.e. whether the if condition evaluates to true or false, control may transfer either to basic block B2 or basic block B3. These possible transfers of control flow are what is reflected in the edges that connect the basic blocks.
FIG. 6 includes a loop or a repeated executed control flow construct 14. The loop involves three basic blocks: B1, B2, and B3. Initially, the variable i 13 is assigned the value 0 outside of the loop, before basic block B0. As this loop is entered, some operations are performed in basic block B1, and the if test determines whether the instructions in B2 or the instructions in basic block B3 will be executed. In basic block B3, the variable i is incremented by one and if the variable i is not equal to 10, then the operation jumps back to basic block B1, and the instructions are executed again. Thus, the loop consisting of basic blocks B1, B2, and B3 are executed multiple times. In fact, this loop will iterate ten times, and the value of the variable i will progress from 0 through 9, until it reaches 10, ultimately in basic block B3, and at which point, the loop, is exited without going go back to basic block B1.
The basic block B0, which appears before the loop body of basic blocks, B1, B2, and B3, is typically referred to as a loop preheader basic block. This block is artificially created by the compiler to assist in further loop optimizations. In particular, when instructions are moved out of loop bodies, they are moved into a loop preheader. Basic block B3 is referred to as the loop back edge source basic block. This is the basic block from which the edge that causes the execution to repeat emanates. The arrow that goes from B3 to B1 is the back edge of the loop 67. Loops are distinguished by a unique back edge. Basic block B1 is the first basic block in the loop body, and is executed when the loop is entered from the preheader, and thus, is referred to as a loop header basic block.
The variable i as previously discussed, is incremented in steps of one each time through the loop. Because this variable progresses through such a regular progression of values, it is referred to as an induction variable. Meaning, that there is an inductive relationship from the value of the variable in one iteration to the value of the variable in the next iteration. Each value induces a new value in the next iteration of the loop. Actually, there is more than one induction variable in this loop: variable j 65, which is assigned the value i+2 in basic block B1, is also an induction variable, because it is just offset from the variable i by 2. So, if i goes through a linear progression, so does j, and therefore j is an induction variable as well.
In basic block B1, the second instruction, which is an assignment to l 66 of k*j, involves a multiplication. The variable k 69, on the right hand side of that assignment, refers to a value that is assigned outside of the loop body, and k is not modified within the loop body. In other words, k remains at the value that it had before the loop body was entered. Thus, within the context of this loop, it is effectively unchanging. Variables of this nature are called region constants (RCs). Simple constants like the numbers 2 and 3 that appear in the loop body are also region constants, in that they do not vary within the loop.
The value of 3+3 computed by the third instruction in basic block B1, and assigned to m 68 is a strength reduction candidate. The instruction 1 equals k*j, is also a strength reduction candidate, but a better candidate in this example is the assignment of m. Variables i and j are both induction variables. Variable i is a special kind of induction variable, called a basic induction variable (BIV), because it has a value that is assigned to it outside the loop, which is then incremented within the loop. Variable j is first assigned within the loop. It does not have a value (and may not even be defined) before the loop body is entered. Thus, variable j is a secondary induction variable that is derived from the basic induction variable, in this case by adding 2. So, basic induction variables are variables that may give rise to other secondary induction variables.
FIG. 7 is a chart 70 depicting the progression of the key variables in the loop of FIG. 6, in particular, the values of the four variables, i 13, j 65, l 66, and m 68, relative to basic block B1. So as the loop iterates, variable i progresses from 0 through 9, in steps of one. Variable j progresses from 2 to 11 in steps of one. Variable l, which tracks k*j, progresses in steps of k 69, starting with 2*k, and moving up to 11*k, and variable m progresses from 2k+3, and assumes the values 3k+3, 4k+3 to the final value of 11k+3.
As mentioned earlier, the value assigned to m is a good strength reduction candidate. FIG. 8 depicts the structure of the expanded expression for the variable m of FIG. 6, which is called an induction expression 80. An induction expression (IX) can be represented as some unchanging value, multiplied by a basic induction variable (BIV), plus some other unchanging value. In examining the values assumed by the variable m, a pattern can be perceived. The pattern can be expressed by the induction expression, k*i+2k+3. If the values of i 13 for each iteration of the loop, i.e 0 to 9, are inserted into the induction expression, then the induction expression evaluates to the values that are assigned to the variable m on each loop iteration. This is a classic induction expression, and has a basic induction variable (BIV) (here variable i) that is multiplied by a loop invariant coefficient, which is called the BIV coefficient (here the simple variable k 69). The BIV coefficient could in general be an arbitrary polynomial that involves only region constants (RC). And to the product of the BIV and the BIV coefficient, another loop unchanging value is added, in this case, 2*k+3, and this is referred to as the addend 81 of the induction expression.
The strength reduction algorithm focuses on induction expressions that occur inside of loops, and strength reduces them. FIG. 9 depicts the strength reduction transformation 90 as it pertains to the example in FIG. 6. Thus, in strength reducing the induction expression k*i+2k+3, when i 13 has the value 0, that is, when it is assigned the value 0 outside of the loop, the value computed by the induction expression is 2k+3 81. The variable that is used to maintain the value of the strength reduced induction expression is called a strength reduction temporary variable, which is shown as t.sub.1 91 in FIG. 9. This strength reduction temporary variable t.sub.1 is incremented by k 69, corresponding to each time the original basic induction variable i is incremented by 1. In general, the strength reduction temporary variable is incremented by the basic induction variable increment multiplied by the BIV coefficient of the induction expression. Thus, this is how the strength reduction transformation is applied to the example in FIG. 6.
Once the strength reduction has been performed, another optimization opportunity called linear function test replacement arises. This involves replacing conditional checks involving the induction variable in the loop by equivalent checks involving strength reduced temporary variables. For instance, in the loop of FIG. 6, the final conditional instruction (if i not equal to 10, then return to basic block B1) can be recast to be an if check on the strength reduced temporary variable t.sub.1 of the form, if t.sub.1 is not equal to t.sub.2 92, go to B1. The variable t.sub.2 is a new temporary variable that is created and initialized to the value of 12k+3, which corresponds to i=10. Therefore, the execution count of the loop governed by the original if statement, if i not equal to 10, go to B1, remains the same when expressed as if t.sub.1 not equal to t.sub.2, go to B1. By performing this transformation, all uses of the original basic induction variable are removed, and hence more instructions may be eliminated from within the loop.
FIG. 10 depicts the results of applying strength reduction and the linear function test replacement transformations on the loop shown in FIG. 6. As shown in FIG. 10, m 68 is assigned the value of t.sub.1 91, the strength reduced temporary variable. t.sub.1 as shown in FIG. 9, is initialized to the value 2k+3 outside the loop 14, and this is done by inserting the initialization in the loop preheader block. Where i 13 was being incremented by 1, the strength reduction temporary variable t.sub.1 is incremented by k 69, and this is shown in block B3. FIG. 10 also illustrates the results of the linear function test replacement. The if check that was at the bottom of basic block B3, is now expressed in terms of an if check 101 between the strength reduction temporary variable t.sub.1 and a new temporary variable t.sub.2, that is initialized to 12k+3 in the loop preheader B0. Since the value t.sub.2 does not change in the loop, it is also a region constant.
The advantage of performing these two transformations is that it allows the elimination of some instructions from the loop. In FIG. 10, the instructions that are used to compute j 68 and l 71, are now no longer needed in the loop 65 because where they were originally being used to assign a value, which is being assigned the value of m 72, now a different variable is being assigned, the strength reduction temporary variable, t.sub.1 91. Thus, there are no more uses for the computation of l 71 and j 68, and so these are dead instructions, and can therefore, be altogether eliminated or removed from the loop body 65. Effectively, the strength reduction transformation 100 has eliminated the multiply operation, specifically the multiplication of k*j, and replaced it with an addition operation in basic block B361. Also note that an addition operation can been eliminated as well, as the assignment to j 68 has also been eliminated. The linear function test replacement allows the elimination of the increment of the variable i 66. Had this replacement not been performed, the increment of i 66 in basic block B361 would have had to be retained because it would have been used in the if check 63. By performing this linear function test replacement, the last remaining real use of i 66 has been eliminated.
FIG. 11 illustrates a loop 14, with basic blocks 61 and edges 62 that relate to the loop, wherein the edges of this control flow graph 60 are annotated with expected execution frequencies. The annotated numbers indicate how often each block is executed during a typical run of the program. From this information, the profitability of the strength reduction transformation can be determined. In this example, the loop is entered ten times and control passes through the loop preheader block B0 and into the loop header block B1 ten times. B6 is the back edge source basic block of this loop, and assume that in this example, it is terminated by an if check that causes this loop to iterate on average 99 times. So if this loop is entered ten times, then control is going to be transferred from basic block B6 to B1, 990 times (99 time for each entry into the loop). Therefore, basic block B1 effectively is going to execute 1,000 times, 10 times when it is entered from within the loop, and 990 times when it is executed by jumping to it from basic block B6.
Basic block B1 is terminated by an if check where, for instance, the if condition is such that 5 times out of a 1,000 control is transferred on to basic block B3, and 995 times, the remaining times, control is transferred to B2. The code in their respected blocks is executed in their respective blocks and control is transferred to a common basic block B4, which again is executed 1,000 times like B1. B4 terminated by an if check, where this if condition causes control to be transferred 600 times to basic block B5 and the remaining 400 times directly to B6. B6 is going to be executing 1,000 times again, and because the loop was entered ten times, the loop must exit ten times.
Assume that this loop body contains a strength reduction candidate induction expression, shown here in basic block B3, as 100* i+20. Also, assume that there is a basic induction variable which is updated at points within the loop body, shown here in basic block B5 and B6. In both cases, the variable is incremented by a region constant amount, either by 1 in B6 or by the variable k 69 in B5. k is assigned a value outside the loop body and remains unchanged throughout this loop body. Thus, strength reducing the induction expression in B3 would result in the loop 14 as shown in FIG. 12.
In FIG. 12 the strength reduced induction expression has been replaced with a strength reduction temporary variable, t.sub.1 91, which is initialized to the value 20 in the preheader. This is because originally the basic induction variable, i, started with the value 0 and so the expression 100* i+20 should effectively start with the value 20. At each of the two places in the loop body 14 where the induction variable is incremented, B561 and B661, t.sub.1 is incremented correspondingly by a scaled amount. So, in B6 where i is being incremented by one, t.sub.1 is incremented by 100, because that is the scale factor of the BIV coefficient multiplied by the BIV increment amount. In basic block B561, the variable t.sub.1 91 is incremented by t.sub.2 92, a region constant value. As shown in basic block B061, the preheader block, t.sub.2 92 is initialized to the value 100 times k 69 and remains invariant throughout the loop. Now, let us see whether this transformation is, in fact, profitable in terms of realizing an actual run-time performance benefit.
Assume that the annotations on the edges correspond to the typical execution frequencies of these control flow transfers. FIGS. 13A and 13B indicate the relative performance costs 130 both before and after the strength reduction transformation, respectively. Let it be assumed that a multiplication operation costs some number of cycles on the hardware, called multiply cost 131 which, for example may be four processor clock cycles. Assume that an addition operation costs some number of cycles on the hardware, called add cost 132, which for example may be one processor clock cycle. The cost of an induction expression computed in B3 can be represented as the execution frequency of basic block B3, multiplied by the sum of those two costs. Given that the execution frequency of the incoming arc was 5, since the induction expression in B3 involves both a multiply and an add, the total cost 133 is equal to 25 cycles over the entire execution of this loop, as shown in FIG. 13A. Now let us compare the 25 cycles worth of time that the induction expression evaluation would have taken to the time required to execute the strength reduced code.
As a result of strength reduction, new instructions have been inserted to basic block B0, basic block B5, and basic block B6. In basic block B0, a multiply and a simple assignment was added, which is typically accomplished through a copy operation in most hardware, which has an associated cost called copy cost 134 and is presumed to be one processor clock cycle. In B5 and B6, an addition has been inserted into each basic block. The time that the hardware will take to execute these instructions in these various basic blocks, thus equals B0's execution frequency times the sum of the multiply and copy costs, plus the B5 execution frequency times the addition cost 132, plus B6's execution frequency times the addition cost 132. Assuming that the cost of performing an add operation, is only one machine clock cycle, then given the execution frequencies that are annotated on the edges of the control flow graph, the total cost 133 for the transformed loop is 1,641 cycles. Thus, it takes 1,641 cycles to execute the new instructions that are introduced as a result of strength reduction. So, while strength reduction saved the computation of the induction expression and therefore, the 25 cycles, the reduced loop forces the hardware to spend far more cycles executed. Therefore, this illustrates that the strength reduction transformation can, in fact, be very unprofitable. The net impact of strength reduction for this example has been an increase in the loop execution time of 1,616 cycles.
There is also another impact on this loop as a result of the strength reduction transformation. In reviewing FIG. 12, it can clearly be seen that two new variables have been introduced in order to carry out the strength reduction transformation, t.sub.1 91 and t.sub.2 92. t.sub.1 is the strength reduced temporary variable, and t.sub.2 is the increment amount for the temporary variable that is used in B5. Now, these two variables will need to live in registers, and the register allocator will have to assign two distinct registers. Thus, two new registers need to be dedicated, at least in the context of this loop to hold the values of t.sub.1 and t.sub.2. This means that the register pressure within the loop has increased by two registers. Specifically, if the source program was being complied for a machine that has a limited number of registers, this increase in the register pressure within the loop, might just exceed the threshold of available registers, causing the introduction of register spill code within the loop. That is, values may need to be transferred back into memory and reloaded from memory as and when they are needed, due to scarcity of register resources. Introducing spill code into the loop is generally unprofitable, because integer computations eliminated by strength reduction (though they may involve multiples and adds) are typically not as expensive as memory operations that are introduced due to spill code. This is because memory operations entail cache accesses that can take multiple cycles, or if the data is not available in the cache, in multiples of tens of cycles. So, in general, spill code is something to be avoided, and thus this is another negative impact of strength reduction for this example. Neither the net cycle count nor the register pressure impact are areas that prior strength reduction algorithms really address.
Static single assignment (SSA) form is a convenient compiler renaming technique, wherein every resource definition site is assigned a unique, internal compiler name, in order to facilitate data flow analysis. There is available literature on static single assignment form, and the seminal paper on this topic is by Rod Cytron, et al, entitled "Efficiently Computing Static Single Assignment Form and the Control Dependence Graph". SSA form facilitates global data flow analysis, which is sort of an underpinning of the strength reduction transformation. In SSA form, in addition to renaming every definition and use of a resource, special .phi. operators are introduced at control flow confluence points in the flow graph to merge multiple reaching definitions of the same resource. The main advantage of using the SSA form, is that it is easier to identify reaching definitions of resources at use sites.
In SSA form, basic induction, variables, can be recognized as being chained together in a data flow sense around the loop. FIG. 14A depicts a classic loop not in SSA form 140. There are variables called l 66, i 13, j 65 and k 69 that are referred to in the loop 140, with variable i being a basic induction variable. The variable i begins with the value of 0 and it is incremented in B361 within the loop body. There are two assignments to the variable k inside the loop. In Basic Block B161, k is assigned the value j+2 and in Basic Block B261, k is assigned the value j+4. The variable j, in turn, is assigned some offset from the basic induction variable. Thus, the variable j is a secondary induction variable.
FIG. 14B is the same code as in FIG. 14A, just reexpressed in SSA form 141. New names have been given for each assignment location. For instance, where I was being assigned outside the loop 14, it is now named l.sub.0 142. The new names could be unique identifiers, but for the purposes of illustration, the new names use a numeric subscript. So l.sub.0 is the assignment outside the loop to the original variable l, and i.sub.0 143 is assigned the value 0 corresponding to the assignment of 0 to i outside the loop.
The variable k in Basic Blocks B161 and B261 now have two assignments to the same variable k and in the SSA form, they are renamed to be k.sub.0 144 and k.sub.1 145. The definition of k in Basic Block B1 is now defining k.sub.0 and the definition of k 69 in Basic Block B2 is now defining k.sub.1. In the original loop, there was a use of the variable k in Basic Block B3, which occurs at the confluence point or merge point of the two control flow transfers from B1 and B2. As mentioned earlier, in the SSA form, this kind of merging of reaching data flow values results in a .phi. operation 146 being inserted.
The .phi. operation can be thought of as a switch or a gating operation, where depending on which control flow edge 62 is taken into the merge point, the appropriate renamed incoming definition is used to assign a value to the left hand side of the .phi. operation. So in B3, for instance, there is a .phi. operation that merges the value ko from Basic Block B1 with the value of k.sub.1 coming out of Basic Block B2. What is created or assigned by the .phi. operation is the variable k.sub.2 147. Thus, subsequent uses of the original variable k will now refer to the variable named k.sub.2. This is the case in Basic Block B3, where a fairly simple .phi. operation is used to merge the two values coming in from two different basic blocks. Similarly, there is a .phi. operation in Basic Block B1. In looking at Basic Block B1, there are two incoming arcs to that basic block, one from Basic Block B0 and one from Basic Block B3. Naturally, this gives rise to a .phi. operation 146 being inserted, in this case for variable i, because there is a definition of variable i that reaches from B0, namely the definition of the variable i.sub.0 in SSA form, and there is a definition of variable i that reaches Basic Block B161 around from the back edge of the loop 67 from B3. In this example, the .phi. operation describes a new variable named i.sub.2 148. The definition of variable i in B3 was renamed and called i.sub.2. So the .phi. operation then is merging i.sub.0 with i.sub.2 in Basic Block B1 and is creating a new variable called i.sub.1 149, so that subsequent uses of the variable i in the loop will refer to the variable i.sub.1. In particular, the variable j in basic Block B1 has been renamed to be j.sub.0 157 and it is assigned the value i.sub.1 -1. Note that i.sub.1 is the nearest reaching definition of the variable i, and is defined by the .phi. operation. This process of renaming variables is applied to all variable definitions whether they are occurring inside loops or outside loops, and the resulting code for the example here is shown in FIG. 14B.
One point to note here is that in the SSA form, the basic induction variable i now has three different names, specifically, the names i.sub.0 i.sub.1 and i.sub.2, each corresponding to the different definition sites. The loop body definition of i.sub.0 and i.sub.2 comprise a basic induction variable group (BIVG). They are related to each other in that the flow of data values amongst the basic induction variable group members forms a cycle around the back edge of the loop that goes through the loop .phi. node. For instance, the i.sub.1 definition in Basic Block B1, is sourced in the i.sub.2 definition in Basic Block B3, which is then in turn sourced back in the same .phi. definition in basic block B1. Thus, the .phi. operation in basic block B1 and the increment operation in basic block B3 form a data flow cycle. One is defining a value that the other one needs. This is very typical of basic induction variables in SSA form. i.sub.1 and i.sub.2 are the basic induction variables that are part of the basic induction variable group (BIVG), and i.sub.0 is the initial value of that basic induction variable group.
Beginning with intermediate representation, cast in an SSA form, strength reduction as it would operate on an SSA form would comprise the following steps at a very high level. First, identify the basic induction variable groups, and then identify strength reduction expression candidates. Next, insert new definitions of strength reduction temporary variables that mirror or parallel the definitions of the BIVG members, and then finally replace the strength reduced induction expression computations with the appropriate strength reduction temporary variable name. In FIG. 14B, there are two possible strength reduction induction expression candidates, the value computed into k.sub.0 and the value computed into k.sub.1. The value assigned to k.sub.0 can be expressed as the induction expression 2*i.sub.1,-2 and the induction expression value assigned to k.sub.1 can be expressed as 4*i .sub.1 +4*l.sub.0. Each expression has a BIV, and in this case they both happen to be the same BIV, namely, i.sub.1. The BIV is multiplied by the BIV coefficient, which is a region constant. The loop invariant addends for the 2 IXs are 2 and 4*l.sub.0 respectively.
Assume that new strength reduction temporary variables called x and y are introduced for k.sub.0 and k.sub.1 respectively. The loop of FIG. 14B will be transformed as a result of strength reduction into the code fragment 150 shown in FIG. 15. To preserve the integrity of the SSA form, .phi. operations 146 have to be introduced for x and y much like the original .phi. operation for variable i. The assignment to k.sub.0 144 and k.sub.1 145 has been replaced with assignments from x.sub.1 151 and y.sub.1 152 respectively. Where the original BIVG members were being updated, there are corresponding updates to the strength reduction temporary variable x and y, which have also been suitably renamed. So, there are x.sub.0 153 and y.sub.0 154 beginning with some values corresponding to the original BIVG initial value i.sub.1 and these assignments are placed in the loop preheader. Where the BIVG member i.sub.2 148 was being assigned i.sub.1 +1, the definitions of x.sub.2 155 and y.sub.2 156 as x.sub.1 +2 and y.sub.1 +4 are inserted respectively.
Because there is now a confluence point for x and y in Basic Block B1, .phi. operations need to be introduced much like the .phi. operation originally introduced for i, for merging the values of x.sub.0, x.sub.2 into x.sub.1 as well as merging the values of y.sub.0 and y.sub.2 into the value y.sub.1. What is depicted in FIG. 15 is the result of this strength reduction transformation. The advantage here is two instructions have been rendered dead, namely the assignment to j.sub.0 151 and the assignment to j.sub.1 158, neither of which have any remaining uses. The multiply operations in the loop have been replaced as a result of applying strength reduction.
FIGS. 16A and 16B illustrates further transformations to the loop 14 shown in FIG. 15. In particular, FIG. 16A illustrates the resulting code fragment 160 after dead code elimination (DCE) has been performed, where the instructions assigning j.sub.0 and j.sub.1 in basic block B161 and B261 have been eliminated, since they are no longer needed. The SSA form is exited and the variables will return back to the original names. The result of these two steps is shown in FIG. 16A. This is not very different from the strength reduction transformation results shown earlier without the use of the SSA form.
After the program is in the form shown in FIG. 16A, register allocation is performed. This is a process of binding variables to specific hardware register locations. For instance, variable l 66 may be assigned to register R1161, where there would be an R1 definition outside the loop corresponding to the definition of l. The use of l in the loop preheader, as shown in FIG. 16A, is replaced with the use of R1. Similarly, other corresponding changes would take place for the different variables, for instance, the variable x 165 is assigned R3162, the variable y 166 is assigned R4163, the original variable i 13 is assigned to register R2164 and so on.
There are three main problem issues with the prior art. The first issue with SSA based strength reduction is that sometimes the algorithm tends to insert more instructions to update strength reduction temporary variables than are actually needed.
The second issue with SSA based strength reduction is that previous algorithms focus solely on strength reducing induction expressions that compute memory address values. P. Markstein, et al. "Strength Reduction chapter of an unpublished text book on program optimization", December, 1992. That is values that are used to refer to memory locations accessed by load and store instructions. It ignores other strength reduction opportunities, thus previous SSA-based SR algorithms would have performed the transformations shown in FIGS. 1-16B only if the candidate induction expressions were computing memory addresses that feed into load or store instructions. Strength reduction candidate induction expressions certainly abound in the realm of memory address expressions, but there are induction expressions that are strength reduction candidates that may arise in different contexts that do not necessarily feed directly into a store or load.
Finally, the third issue is that the prior art does not discuss or apply to strength reduction of predicated code. Predication is a hardware feature that will be further described later on.
These issues are further described as follows.
The first issue was strength reduction may insert more instructions into the loop than necessary in order to update temporary variables. FIG. 17A depicts the original code fragment 170 for a loop with an induction variable i 13 which is incremented in two locations, basic block B161 and B261. The computation of 4*i that is assigned to the variable k 69 in basic block B36lis a strength reduction candidate. The conversion of the code of FIG. 17A into SSA form results in variables being renamed as shown in FIG. 17B, where it can be noted that a couple of .phi.'s 146 have been introduced for the original basic induction variable, one at basic block B161 and the other at basic block B461, because values of the renamed instances of the induction variable are being merged both at B1 and at B4. B4 is a merge point between B261 and B361 whereas B1 is a merge point between B061 and B4. The .phi. in B1 merges the value i.sub.0 143 which is the renamed instance of i as sourced from basic block B0, and i.sub.4 171, which is the value that flows from basic block B4 into basic block B1. Thus, the input operands for this .phi. are i.sub.0 and i.sub.4. The .phi. at B4 emerges the values of i.sub.3 172 from B2 and the value of i.sub.2 148 from B1. Thus, the input operand for this .phi. are i.sub.2 and i.sub.3. Note that if control is transferred from B1 to B3 and then on to B4, the basic induction variable i is not incremented, and so the value that is sourced along the B3, B4 edge 62 is in fact the same value that was computed in B1, and that is why the .phi. in B4 source i.sub.2 directly, because that is the reaching definition from the B3 side. Applying strength reduction would result in the code shown in FIG. 17C. A variable x has been introduced along with instructions that update it in parallel with the definitions of i.sub.1, i.sub.2, i.sub.3 and i.sub.4. Thus, corresponding to the two .phi.'s 146 for the original base induction variable i, there are two .phi.'s 146 for the strength reduction temporary variable x, one in B161 and one in B461. Corresponding to the two increment points of the original basic induction variable i, there are two increment points for the variable x suitably renamed. Just as i.sub.0 143 through i.sub.4 171 are names corresponding to the different definition sites for i, there x.sub.0 173 through x.sub.4 174 are names for the strength reduction temporary variable x that has been introduced. The multiplication in B3 has been replaced by a simple assignment of x.sub.2 to k.sub.0. If control passes from B1 to B2 to B4, what happens is that the strength reduced temporary variable x gets incremented twice. The first time by four in B1, and a second time in B2 by four again. This parallels the two increment points for the original basic induction variable i, but is inefficient, since the strength reduction variable is being incremented twice.
What is preferred is to perform only one increment of the variable x in each of basic block B2 and B3 as shown in FIG. 17D. The inventive method would result the code depicted in FIG. 17D from the example of FIGS. 17A to 17C. This avoids incrementing x multiple times when executing basic block B1 followed by B2 and B4. Thus, the code shown in FIG. 17D executes a single instruction to increment x by eight in B2 if control passes from B1 to B2 to B4 and only executes the increment by four if control passes from B1 to B3 to B4. There is a net dynamic path length savings when taking the left hand limb of that if check, even though there is no saving on the right hand limb. Overall, it is beneficial to generate the code fragment that is shown in FIG. 17D.
This is a feature that the traditional strength reduction algorithm does not perform, but the current invention will perform this optimization. In FIG. 17D there are a few other features that should be mentioned. In FIG. 17C, k.sub.0 144 was assigned x.sub.2 155, but in FIG. 17D, k.sub.0 144 is assigned x.sub.1 151, which is defined by the .phi.. The initial value of x, namely x.sub.0 173, that is the value sourced basic block B061 by the .phi. in B.sub.1, is four instead of zero. This skews the strength reduction temporary variable by four to effect the transformation. This preserves the correctness of the loop, and at the same time, realizes an improvement in the dynamic path line through this loop, especially as it relates to the B1, B2, B4 path. If it turns out that B1, B2, B4 are being executed more frequently than B1, B3, B4, then this could lead to some measurable gains.
The second issue, that SSA based strength reduction algorithm focuses solely on strength reducing induction expressions that compute memory address values, is illustrated in FIGS. 18A-18C. FIG. 18A depicts a C source code 180 listing of routine main 12 where execution typically starts for any C program. Inside this routine, there is a loop coded as a do-while loop 181 involving the local variable i 66 that is declared to be of type "int". The value zero is assigned to i 66 outside of the do-while loop 181. Inside of the loop, a value is assigned to an array A 183, which is a global array that is declared outside the scope of the function, and the array has 10 integer elements as can be seen from the declaration.
Each time through this do-while loop 181, a different element of array A 183 is assigned a value. This is very typical of array assignments and loops where the array variable is indexed by an induction variable, in this case the variable i 13, so that through the course of executing this loop, different values are stored to each element of the array. The variable i is incremented by one, which is the effect of the i++ statement that follows the assignment to the array variable A. The loop exit condition is governed by the while check, which indicates that while i is less than 10, the body of this do-while loop will be executed, for a total of 10 times, where i will take on the values 0 through 9. On the last iteration when i reaches the value 10, the while condition will no longer be true. That is, 10 will no longer be less than 10, and so the do-while loop will terminate execution. Thus, a value is assigned to the 10 elements of the array variable A, that correspond to indices 0 through 9, and what is being stored is an induction expression value, 8*i+2. Now the translated code for this source program is shown in FIG. 18B, which illustrates the low level intermediate representation before strength reduction is performed. For the purposes of this discussion here, SSA form will be ignored to clarify the main point.
In basic block B161, which is the only basic block in the loop body, there are a sequence of instructions to carry out the statement A[i] is equal to 8*i+2, the increment of i, and the while condition check. The array A has been declared to be of an integer type as shown in FIG. 18A. Typically, on many machines, an integer data type occupies 4 bytes of memory space. Thus, in order to store values to the consecutive elements of the array A, one needs to scale the index value by 4.
In examining the instructions in basic block B1, the right hand side is first evaluated, which is 8*i+2, and this is done by of first multiplying i by 8 and then adding 2 to the result. The variables t.sub.1, t.sub.2, t.sub.3 and t.sub.4 are merely internal compiler temporary variables; they don't necessarily reflect SSA renaming discussed previously. t.sub.1 is assigned 8*i and t.sub.2 is assigned t.sub.1 +2. Thus, t.sub.2 computes the right hand side of the statement A[i]=8*i+2. Next, t.sub.2 is assigned to an element of array variable A. This is done through a store instruction. Now, the store instruction needs to specify a memory address, in particular, the address of the element A[i], and this is done by scaling i by 4. The scaling is by 4 because each element of array is a 4 byte integer quantity, according to the assumption. The scaled value is added to the base address of the array variable A. If t.sub.0 corresponds to the base address of array variable A, what must be done is sum that with the scaled index value and produce the effective address where the value of t.sub.2 will be stored. The variable t.sub.0, which corresponds to the base address of array variable A, has been initialized in basic block B061.
In the C language, array elements start with an index of 0 and progress onwards. The very first element of the array variable A is referred to as A[o], and so the address of that is what is assigned to to and is summed with the scaled value t.sub.3 in producing the effective address t.sub.4 within the loop.
The rest of the instructions in B1 are fairly self-explanatory. There is an increment of i and a check against 10 to determine whether further iteration is required. FIG. 18C illustrates the transformed loop 182, assuming that only the address induction expressions are strength reduced. In FIG. 18B, there are two induction expressions, the one that is assigned to t.sub.2 and the one that is assigned to t.sub.4. The induction expression that is assigned to t.sub.4 or computed by t.sub.4 is sourced directly by a store instruction as the base address value. That refers to the location where a value needs to be stored in memory. That is considered an address induction expression (AIX).
The induction expression that is computed by t.sub.2, is an ordinary induction expression (OIX), in that it does not feed into a base address value. It is sourced by the store, but it is in fact the value that will be stored in memory, not the address to which it will be stored. Assuming that strength reduction is only applied to address induction expressions, then this would result in the code as shown in FIG. 18C, where t.sub.4, the address induction expression, has been strength reduced and is assigned a strength reduced temporary variable x, that is initialized in the preheader and incremented in steps of 4 within the loop. Note that the computation of t.sub.3 has been rendered dead and can be removed. There is an advantage in strength reducing the AIX, but the OIX computed by t.sub.2 has not been strength reduced.
This is a feature that the SSA based strength reduction algorithm described by Markstein, et al. does not perform, but the current invention will perform this strength reduction. Had this been done, then the resulting code be as shown in FIG. 18D, where both t.sub.2 as well as t.sub.4 are strength reduced using strength reduction temporary variables x and y. Strength reducing both the OIX and the AIX allows the elimination of two instructions as dead, the computation of t.sub.1 and the computation of t.sub.3. Moreover, since the only use of the original basic induction variable i is in the if check, linear function test replacement can be performed, and then the increment of i can be eliminated from the loop. This discussion again ignores the SSA form to simplify the explanation, but this would apply in the SSA form as well.
The third issue is the issue of supporting predicated code. Now predication is a technique that eliminates conditional branches in a program by guarding the execution of the instructions that are to be conditionally executed by a boolean hardware flag that is initialized with the result of the original branch condition. This is illustrated in FIG. 28. FIG. 28A depicts an if statement 280, where if i less than 10 j is set to i minus 1, and k is set to j times 2, and k is used subsequently. FIG. 28B is the intermediate representation corresponding to that code fragment where basic block B061 performs the if check, with the use of a machine opcode called compare and branch (COMB). This opcode compares the value in register Ri which contains the value i against 10 and compares it for a greater than or equal to relationship. And if that relationship holds, then control is transferred to basic block B261, otherwise control passes to the next sequential block of code in basic block B161. Note, the COMB checks the inverse of the original if condition because the semantics are that if the condition holds, then a branch operation to B2 is performed, which skips around the instructions that assign values to j and k in basic block B1.
Now the same block of code if it were to be predicated would appear as shown in FIG. 28C. There is a compare operation that checks a "less than" relationship between Ri and 10. And if that condition happens to be true, it sets a one bit predicate register to one, and if not it sets it to zero. The P.sub.1 predicate register is used to guard the execution of the instructions that compute j and k. This is shown by prefixing the instructions that compute Rj and Rk with P.sub.1 enclosed in parenthesis. The convention is that P.sub.1 is a qualifying predicate which is being queried, before executing the guarded instruction. So the way this works is if the predicate register happened to have the value 1, then the guarded instruction is executed. If it happened to have the value 0, then the guarded instruction is not executed. Thus, the two codes of FIGS. 28B and 28C are equivalent.
The code sequence involving the conditional branch instruction and the code sequence involving the predicated instruction sequence are in fact equivalent, in that if i is less than 10 (i.e., the contents of Ri is less than 10), then the assignments to j and k are executed and if not, they are omitted. However, branch instructions are worse than predication in terms of high performance instruction pipeline processing. They can cause interruptions in pipeline sequencing and incur performance penalties. Predication, on the other hand, is generally easier to process and does not incur pipeline penalties. Predication is a feature provided by hardware and is used in the translation of conditional constructs into straight line constructs, as shown in FIGS. 28B and 28C, where there originally were basic blocks B0, B1, and B2 and through predication only a single basic block 61 remains. Effectively, what has happened is a conversion of a control dependence relationship into a data dependence relationship. While the branch at the end of basic block B0 was originally controlling the execution of B1, the execution of the instructions in basic block B1 are now dependent on the predicate data value P.sub.1 that is produced by the compare.
There are several problems that predication causes for strength reduction. First, there is the issue of recasting predicated code into SSA form and that is an issue that needs to be addressed for all SSA based optimizations, not necessarily just SSA based strength reduction. Another problem introduced by predication as regards to strength reduction is the recognition of BIVGs in predicated code. That is, how are basic induction variable groups recognized in predicated code? Other problems include the identification of strength reduction candidate induction expressions in predicated code, and updating the predicated code to reflect the effects of strength reduction while preserving integrity of the SSA form.
FIG. 29 illustrated the problems introduced by predicated code. FIG. 29A depicts a loop 290 involving multiple basic blocks 61 in its body. There is an if check 291 within the loop body that causes a basic induction variable i 13 to be either incremented by one or decremented by two, depending on the if condition result. There are three induction expressions that are strength reduction candidates. The first computes eight times i which is assigned to j. The second computes four times i that is assigned to k. The third computes two times i that is assigned to l. Now the equivalent predicated code sequence for the loop body is shown in FIG. 29B where there are two compares 292, one compare that evaluates condition C.sub.1, the condition that the if statement was originally checking, and a second compare that checks the inverse of that relationship, namely C(bar).sub.1 . In other words, P.sub.1 and P.sub.2 are predicate registers that are complementary, thus if one of those predicates is true, the other predicate is false, where true would correspond to the predicate register having the value one and false, the predicate register having the value zero. So, the statements that appear on the right hand side of the original if then else construct, the i=i+1, k=4*i, will be guarded by the true predicate P.sub.1, and the instructions that are on the left hand side of that if then else construct are guarded by the complimentary predicate register P.sub.2. So, depending on the result of the compare, either i=i+1 and k=4*i are executed or i=i-2 and j=8*i are executed. And then, regardless of how the comparison turns out, the L=2*i and i=i+1 instructions are executed.
It is possible for the code shown in FIG. 29B to have been reordered by an earlier code optimization phase, a phase that operates ahead of strength reduction, and just for illustration purposes, it is assumed that the instruction scheduling has reordered these instructions to improve pipeline efficiency, and the result of instruction scheduling is shown in FIG. 29C. What has happened is that the compare instructions have been inverted. But more importantly, the pair of instructions that were each guarded by predicate P.sub.1 or predicate P.sub.2 have been swapped. The assignment that decrements i by 2 is guarded by predicate P.sub.2 which appears first, and then is followed by the increment of i that appears under the complementary predicate, and then there are assignments to j and k that follow. Now, looking at this sequence of code, ascertaining the value of i that is actually being multiplied by 8 and assigned to j, is not immediately answerable. It may be that i is decremented by 2, or it might be that the value of i is incremented by 1.
The arrows with circled i's FIG. 29C indicate the true reaching values. Note, for instance, that the use of i in the 2*i computation could get its value from either one of the previous assignments to variable i. The variable i sourced by the 8*i statement, is guarded by predicate P.sub.2, and the value that is being sourced there is the value of i decremented by 2 from its previous value, and not the value of i incremented by one from its previous value. This can only be deduced after establishing that P.sub.1 and P.sub.2 are complementary, that if one is true, then the other is false. However, in general, that is not always the case. There may be predicates guarding instructions that are not complementary and so when one is true, it is not necessarily the case that the other is false. Great care must be taken in doing this kind of analysis. This sort of reaching definition analysis is quite important in order to carry out strength reduction optimizations.