1. Field of the Invention
This invention generally relates to computer software compilers, and more particularly to optimizers in computer software compilers that perform an opimization called partial redundancy elimination (PRE).
2. Related Art
The Static Single Assignment Form (SSA) has become a popular program representation in optimizing compilers, because it provides accurate use-definition (use-def) relationships among the program variables in a concise form. Before proceeding further, it may be useful to briefly describe SSA.
In SSA form, each definition of a variable is given a unique version, and different versions of the same variable can be regarded as different program variables. Each use of a variable version can only refer to a single reaching definition. When several definitions of a variable, a.sub.1, a.sub.2, . . . , a.sub.m, reach a common node (called a merging node) in the control flow graph of the program, a .phi. function assignment statement, a.sub.n =.phi.(a.sub.1, a.sub.2, . . . , a.sub.m), is inserted to merge the variables into the definition of a new variable version a.sub.n. Thus, the semantics of single reaching definitions are maintained.
Many efficient global optimization algorithms have been developed based on SSA. Among these optimizations are dead store elimination, constant propagation, value numbering, induction variable analysis, live range computation, and global code motion. However, all these uses of SSA have been restricted to solving problems based on program variables, since the concept of use-def does not readily apply to expressions. Noticeably missing among SSA-based optimizations is partial redundancy elimination.
Partial redundancy elimination (PRE) is a powerful optimization algorithm. PRE was first described in E. Morel and C. Renvoise, "Global optimization by suppression of partial redundancies," Comm ACM, 22(2):96-103, February 1979. PRE targets partially redundant computations in a program, and removes global common subexpressions and moves invariant computations out of loops. PRE has since become the most important component in many global optimizers.
PRE shall now be generally described with reference to FIGS. 10A and 10B. FIG. 10A illustrates a program control flow graph having basic blocks 1002, 1004, 1006. Basic blocks 1004 and 1006 contain an expression a+b. There are two paths through this control flow graph: basic block 1002 to basic block 1006, and basic block 1004 to basic block 1006. When the path from basic block 1102 to basic block 1006 is taken, the expression a+b is performed only once. However, when the path from basic block 1004 to basic block 1006 is taken, the expression a+b is redundantly performed twice. Accordingly, the scenario shown in FIG. 10A is an example of partial redundancy of the expression a+b.
For performance purposes, it would be desirable to eliminate the expression a+b in basic block 1006, since its performance in basic block 1006 is redundant to its performance in basic block 1004. However, the expression a+b is not performed in basic block 1002. Accordingly, without additional modification, the expression a+b cannot be eliminated from basic block 1006.
PRE works as shown in FIG. 10B. According to PRE, the results of expression a+b is stored in a variable t in basic block 1004. This expression from basic block 1004 is inserted in basic block 1002, thereby making the expression fully redundant. Then, the expression a+b is eliminated from basic block 1006, and all references to it are replaced by the variable t.
Knoop et al. formulated an alternative placement strategy called lazy code motion that improves on Morel and Renvoise's results by avoiding unnecessary code movements, and by removing the bidirectional nature of the original PRE data flow equations. The result of lazy code motion is optima: the number of computations cannot be further reduced by safe code motion, and the lifetimes of the temporaries introduced are minimized. See J. Knoop, O. Ruthing, and B. Steffen, "Lazy code motion," Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 224-234, June 1992; J. Knoop, O. Ruthing, and B. Steffen, "Optimal code motion: Theory and practice," ACM Trans. on Programming Languages and Systems 16(4):1117-1155, October 1994.
Drechsler and Stadel gave a simpler version of the lazy code motion algorithm that inserts computations on edges rather than in nodes. See K. Drechsler and M. Stadel, "A variation of Knoop, Ruthing and Steffen's lazy code motion," SIGPLAN Notices, 28(5):29-38, May 1993. It should be noted that the above published algorithms do not utilize SSA.
Optimizations based on SSA all share the common characteristic that they do not require traditional iterative data flow analysis in their solutions. They all take advantage of the sparse representation of SSA.
In a sparse form, information associated with an object is represented only at places where it changes, or when the object actually occurs in the program. Because it does not replicate information over the entire program, a sparse representation conserves memory space. Information can be propagated through the sparse representation in a smaller number of steps, speeding up most algorithms.
To get the full benefit of sparseness, one must typically give up operating on all elements in the program in parallel, as in traditional bit-vector-based data flow analysis. But operating on each element separately allows optimization decisions to be customized for each object.
There is another advantage of using SSA to perform global optimization. Traditional non-SSA optimization techniques often implement two separate versions of the same optimization: a global version that uses bit vectors in each basic block, and a simpler and faster local version that performs the same optimization within a basic block. SSA-based optimization algorithms do not need to distinguish between global and local optimizations. The same algorithm can handle both global and local versions of an optimization simultaneously. The amount of effort required to implement each optimization can be correspondingly reduced.
Prior to the present invention, a PRE algorithm based on SSA did not exist. As was hinted in D. Dhamdhere, B. Rosen, and K. Zadeck, "How to analyze large programs efficiently and informatively," Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 212-223, June 1992, any attempt at developing a PRE algorithm based on SSA is difficult because an expression E can be redundant as the result of many different computations at different places of the same expression E', E", . . . whose operands have different SSA versions from the operands of E. This is illustrated in FIG. 3A, where the expression E is generally represented by a+b.
In such a situation, the use-def chain of SSA does little to help in recognizing that E is partially redundant (see basic blocks 302 and 308). It also does not help in effecting the movement of computations. Lacking an SSA-based PRE algorithm, optimizers that use SSA have to switch to bit-vector algorithms in performing PRE. To apply subsequent SSA-based optimizations, it is necessary to convert the results of PRE back into SSA form, and such incremental updates based on arbitrary modifications to the program are expensive.
Accordingly, what is required is a compiler that performs partial redundancy elimination (PRE) using the SSA form.
Before proceeding further, it may be useful to consider work aimed at improving the efficiency of data flow analysis and PRE.
By generalizing SSA form, Choi et al. derived Sparse Evaluation Graphs as reduced forms of the original flow graph for monotone data flow problems related to variables. The technique must construct a separate sparse graph per variable for each data flow problem, before solving the data flow problem for the variable based on the sparse graph. Thus, it cannot practically be applied to PRE, which requires the solution of several different data flow problems. See J. Choi, R. Cytron, and J. Ferrante, "Automatic construction of sparse data flow evaluation graphs," Conference Record of the Eighteenth ACW Symposium on Principles of Programming Languages, pages 55-66, January 1991.
Dhamdhere et al. observed that in solving for a monotone data flow problem, it suffices to examine only the places in the problem where the answer might be different from the trivial default answer .perp.. There are only three possible transfer functions for a node: raise to , lower to .perp., or identity (propagate unchanged). They proposed slotwise analysis. For nodes with the identity transfer function, those that are reached by any node whose answer is .perp. will have .perp. as their answer. By performing the propagation slotwise, the method can arrive at the solution for each variable in one pass over the control flow graph. Slotwise analysis is not sparse, because it still performs the propagation with respect to the control flow graph of the program. The approach can be used in place of the iterative solution of a monotone data flow problem as formulated. It can be used to speed up the data flow analyses in PRE. See D. Dhamdhere, B. Rosen, and K. Zadeck, "How to analyze large programs efficiently and informatively," Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 212-223, June 1992.
Johnson proposed the use of Dependence Flow Graphs (DFG) as a sparse approach to speed up data flow analysis. The DFG of a variable can be viewed as its SSA graph with additional "merge" operators imposed to identify single-entry single-exit (SESE) regions for the variable. By identifying SESE regions with the identity transfer function, the technique can short-circuit propagation through them. Johnson showed how to apply his techniques to the data flow systems in Drechsler and Stadel's variation of Knoop et al.'s lazy code motion. See R. Johnson, "Efficient program analysis using dependence flow graphs," Technical Report (PhD Thesis), Dept. of Computer Science, Cornell University, August 1994.
Researches at Rice University have done worked aimed at improving the effectiveness of PRE. The work involves the application of some SSA-based transformation techniques to prepare the program for optimization by PRE. Their techniques enhance the results of PRE. Their implementation of PRE was based on Drechsler and Stadel's variation of Knoop et al.'s lazy code motion, and was unrelated to SSA. See P. Briggs and K. Cooper, "Effective partial redundancy elimination," Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 159-170, June 1994; and K. Cooper and T. Simpson, "Value-driven code motion," Technical Report CRPC-TR95637-S, Dept. of Computer Science, Rice University, October 1995.
All prior work related to PRE (including those described above) has modeled the problem as systems of data flow equations. Regardless of how efficiently the systems of data flow equations can be solved, a substantial amount of time needs to be spent in scanning the contents of each basic block in the program to initialize the local data flow attributes that serve as input to the data flow equations. Experience has shown that this often takes more time than the solution of the data flow equations, so a fundamentally new approach to PRE that does not require the dense initialization of data flow information is highly desirable. A PRE algorithm based on SSA would solve this problem, since SSA is sparse. However, prior to the present invention, a PRE algorithm based on SSA did not exist.