1. Field of the Invention
The present invention relates generally to computer program (i.e., software) compilers and more particularly to optimizers in computer program compilers that perform an optimization known as partial redundancy elimination (PRE).
2. Related Art
The Static Single Assignment Form (SSA) is a popular program representation in optimizing compilers, because it provides accurate use-definition (use-def) relationships among the program variables in a concise form. SSA is described in detail in R Cytron et al., Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991, which is incorporated herein by reference in its entirety.
The SSA form can be briefly described as a form where each definition of a variable is given a unique version, and different versions of the same variable can be regarded as different program variables. Each use of a variable version can only refer to a single reaching definition. When several definitions of a variable, a.sub.1, a.sub.2, . . . , a.sub.m, reach a common node (called a merging node) in the control flow graph of the program, a .phi. function assignment statement, a.sub.n =.phi.(a.sub.1, a.sub.2, . . . , a.sub.m), is inserted to merge the variables into the definition of a new variable version a,. Thus, the semantics of single reaching definitions are maintained.
Optimizations based on SSA all share the common characteristic that they do not require traditional iterative data flow analysis in their solutions. They all take advantage of the sparse representation of SSA. In a sparse form, information associated with an object is represented only at places where it changes, or when the object actually occurs in the program. Because it does not replicate information over the entire program, a sparse representation conserves memory space. Thus, information can be propagated through the sparse representation in a smaller number of steps, speeding up most algorithms.
Many efficient global optimization algorithms have been developed based on SSA. Among these optimizations are dead store elimination, constant propagation, value numbering, induction variable analysis, live range computation, and global code motion. However, partial redundancy elimination (PRE), a powerful optimization algorithm, was noticeably missing among SSA-based optimizations. PRE was first described in E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies, Comm. ACM, 22(2):96-103, February 1979, which is incorporated herein by reference in its entirety. PRE, which has since become an important component in many global optimizers, targets partially redundant computations in a program, and removes global common sub-expressions and moves invariant computations out of loops. In other words, by performing data flow analysis on a computation, it determines where in the program to insert the computation. These insertions in turn cause partially redundant computations to become fully redundant, and therefore safe to delete.
Given the fact that PRE was not among SSA-based optimizations a method for performing SSA-based PRE of expressions, known as SSAPRE, along with a discussion of SSA and PRE was disclosed and described in detail in a commonly-owned, co-pending application entitled "System, Method, and Computer Program Product for Partial Redundancy Elimination Based on Static Single Assignment Form During Compilation" having application Ser. No. 08/873,895 (Attorney Docket No. 15-4-479.00), filed Jun. 13, 1997, now allowed incorporated herein by reference in its entirety.
SSAPRE can be briefly described as a six-step method that allows partial redundancy elimination (PRE) to be done directly on a static single assignment (SSA) representation of a computer program during compilation. SSAPRE may be considered sparse because it does not require collecting traditional local data flow attributes over the program and it does not require any form of iterative data flow analysis to arrive at its solution.
First, SSAPRE processing involves a ".PHI.-insertion" step that inserts .PHI. functions for expressions where different values of the expressions reach common points in the computer program. The result of each of the .PHI. functions is assigned to a hypothetical variable h.
Second, SSAPRE performs a "renaming" step where SSA versions are assigned to hypothetical variables h in the computer program. In one embodiment, the renaming step may involve a delayed renaming approach.
Third, SSAPRE further performs a "down safety" step of determining whether each .PHI. function in the computer program is down safe.
Fourth, SSAPRE performs a "will be available" step that accurately predicts whether each expression in the computer program will be available at each .PHI. function following eventual insertion of code into the computer program for purposes of partial redundancy elimination.
Fifth, SSAPRE additionally performs a "finalize" step of transforming the SSA representation of the computer program having hypothetical variables h to an SSA graph that includes some insertion information reflecting eventual insertions of code into the computer program for purposes of partial redundancy elimination.
Sixth, SSAPRE performs a "code motion" step of updating the SSA representation of the program based on the insertion information to introduce real temporary variables t for the hypothetical variables h.
SSAPRE optionally performs a "collect-occurrences" step of scanning the SSA representation of the computer program to create a work list of expressions in the computer program that need to be considered during optimization.
Despite the development of SSAPRE, there still exists room for improvements in other areas of compiler optimization. For example, register allocation is among the most important functions performed by an optimizing compiler. Prior to register allocation, it is necessary to identify the data items in the program that are candidates for register allocation. To represent register allocation candidates, compilers commonly use an unlimited number of pseudo-registers. Pseudo-registers are also called symbolic registers or virtual registers, to distinguish them from real or physical registers. Pseudo-registers have no alias, and the process of assigning them to real registers involves only renaming them. Thus, using pseudo-registers simplifies the register allocator's job.
Optimization phases generate pseudo-registers to hold the values of computations that can be reused later, like common sub-expressions and loop-invariant expressions. Variables declared, for example, with the "register" attribute in the C programming language, together with local variables determined by the compiler to have no alias, can be directly represented as pseudo-registers. All remaining register allocation candidates have to be assigned pseudo-registers through the process of register promotion.
Register promotion identifies sections of code in which it is safe to place the value of a data object in a pseudo-register. Register promotion is regarded as an optimization because instructions generated to access a data object in a register are more efficient than if it is not in a register. If later register allocation cannot find a real register to map to a pseudo-register, it can either spill the pseudo-register to memory or re-materialize it, depending on the nature of the data object.
After an earlier alias analysis, during a compilation, has already identified the points of aliasing in the program, and that these aliases are accurately characterized in the SSA representation of the program, register promotion can be performed. The register promotion phase (of a program optimization) inserts efficient code that sets up data objects in pseudo-registers, and rewrites the program code to operate on them. The pseudo-registers introduced by register promotion are maintained in valid SSA form. Targets for register promotion typically include scalar variables, indirectly accessed memory locations and program constants.
Different approaches have been used in the past to perform register promotion. In F. Chow and J. Hennesey, The Priority-based Coloring Approach to Register Allocation, ACM Trans. on Programming Languages and Systems, 12(4):501-536, October 1990, a use of data flow analysis to identify the live ranges where a register allocation candidate can be safely promoted was disclosed. Because global register allocation was performed relatively early, at the end of global optimization, a separate register promotion phase was not required. Instead, register promotion was integrated into the global register allocator, and profitable placement of loads and stores is performed only if a candidate is assigned to a real register. In optimizing the placement of loads and stores, a simplified and symbolic version of PRE was used that made use of the fact that the blocks that make up each live range must be contiguous.
In K. Cooper and J. Lu, Register Promotion in C Programs, Proceedings of the ACM SIGPLAN '97 Conference of Programming Language Design and Implementation, pp. 308-319, June 1997, an approach that is entirely loop-based was disclosed. By scanning the contents of the blocks comprising each loop, candidates that can be safely promoted to register in the full extent of the loop are identified. The load to a pseudo-register was generated at the entry to the outermost loop where the candidate was promotable. The store, if needed, was generated at the exit of the same loop. The algorithm presented handled both scalar variables and pointer-based memory accesses where the base was loop-invariant. The approach was "all-or-nothing" in the sense that if only one part of a loop contains an aliased reference, the candidate would not be promoted for the entire loop. It did not handle straight-line code, relying instead on the PRE phase to achieve the effects of promotion outside loops, but it was not clear if the algorithm's PRE phase could handle stores appropriately.
In D. Dhamdhere, Register Assignment Using Code Placement Techniques, Journal of Computer Languages, 15(2):83-94, 1990, the recognition that register promotion can be modeled as a problem of code placement for loads and stores, thereby benefitting from the established results of PRE, was first made. The Load-Store Insertion Algorithm (LSIA) disclosed was an adaptation of Morel and Renvoise's PRE algorithm for load and store placement optimization. LSIA solved for the placements of both loads and stores simultaneously.
As recognized by the Inventors, the PRE of stores in the context of register promotion, specifically, can be viewed as another approach to partial dead store elimination (PDE), for which numerous algorithms have also been described. In F. Chow, A Portable Machine-independent Global Optimizer--Design and Measurements, Technical Report 83-254 (PhD. Thesis), Computer Systems Laboratory, Stanford University, December 1983, the dual of Morel and Renvoise's PRE algorithm was applied, to the optimization of store statements. After solution of the data flow equations in bit vector form, an insertion pass identified the latest insertion point for each store statement taking into account any possible modification of the right hand side expression. In Knoop et al., Partial Dead Code Elimination, Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pp. 147-158, June 1994, an algorithm, also PRE-based, was presented. The Knoop algorithm, however, was separated into an elimination step and a sinking step, and iterated exhaustively so as to cover second order effects. The algorithm was thus more expensive than straight PRE. To additionally cover "faint code elimination" (a store is faint if it is dead or becomes dead after some other dead stores have been deleted), a "slotwise" solution of the data flow equations was used as described in D. Dhamdhere et al., How to Analyze Large Programs Efficiently and Informatively, Proceedings of the ACM SIGPLAN '92 Conference of Programming Language Design and Implementation, pp. 212-223, June 1992.
The PRE-based approaches to PDE did not modify the control flow structure of the program, thus limiting the partial dead stores that may be removed. Non-PRE-based PDE algorithms may remove additional partial dead stores by modifying the control flow. In L. Fiegen et al., The Revival Transformation, Conference Record of the Twenty First ACM Symposium on Principles of Programing Languages, pp. 147-158, January 1994, a revival transformation was disclosed where a partially dead statement is detached from its original place in the program and reattached at a later point at which it is minimally dead. In cases where movement of a single store is not possible, the transformation moved a superstructure that included other statements and branches. However, the coverage of the revival transformation was limited because it may not be applied across loop boundaries. The algorithm as presented also did not consider situations that required multiple re-attachment points to remove a partially dead store.
A PDE approach using slicing transformations was recently proposed in R. Bodik and R. Gupta, Partial Dead Code Elimination using Slicing Transformation, Proceedings of the ACM SIGPLAN '97 Conference of Programming Language Design and Implementation, pp. 159-170, June 1997. Instead of moving partially dead statements, the approach of predicating them was taken. The predication embedded the partially dead statement in a control flow structure, determined through program slicing, such that the statement was executed only if the result of the statement was eventually used. A separate branch deletion phase restructures and simplifies the flow graph. Bodik and Gupta showed that for acyclic code, all partially dead statements may be eliminated. Their algorithm worked on one partially dead statement at a time. Since the size of the code may grow after the PDE of each statement, complete PDE may take exponential time, and result in massive code restructuring. The vastly different code shape can cause additional variation in program performance.
Another PDE algorithm, described in Gupta et al., Path Profile Guided Partial Dead Code Elimination Using Predication, Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, pp. 102-112, November 1997, used predication to enable code sinking in removing partial dead stores. The technique used path profiling information to target only statements in frequently executed paths. A cost-benefit data flow analysis technique determined the profitability of sinking, taking into account the frequencies of each path considered. The same approach is used in Gupta et al., Resource Sensitive Profile-Directed Data Flow Analysis for Code Optimization, Proceedings of the 30th Annual International Symposium on Microarchitectures, pp. 358-368, December 1997 to speculatively hoist computations in PRE. Decisions to speculate were made locally at individual merge or split points based on the affected paths. Acyclic and cyclic code were treated by different versions of the algorithm.
What is needed is a method, system, and computer program product for deriving new efficient and flexible methods for partial redundancy elimination of many types of computations (e.g., expressions, loads, stores, assignments, and the like) and code placement directions (e.g., forward or backward). Further, what is needed is an efficient method, system, and computer program product for performing register promotion via load and store placement optimization within an optimizing compiler.