Solutions to information problems are needed in most optimizing and parallelizing compilers and software development environments. Compiler optimization problems are typically formulated as data flow frameworks, in which the solution of a given problem at a given program point is related to the solution at other points (Rosen, B. K., Data Flow Analysis for Procedural Languages, J. ACM 26, (2) (April 1979), pg. 322-344; Tarjan, R., Journal of the Association for Computing Machinery 28, (3) (1981), pg. 594-614). Such problems can be solved by iterating over nodes of a control flow graph until the solution (over all nodes) converges. The quality and speed of evaluating these frameworks are well-understood, and data flow methods are understandably prevalent in most optimizing compilers.
Traditional methods for solving data flow problems fall into one of two categories: bit-vectoring and direct connections. Bit-vectoring methods propagate the solution at a given node of a control flow graph to the successors or predecessors of that node (Kildall, G., Conference Record of First ACM Symposium on Principles of Programming Languages, 194-206 (January 1973)). Compiler writers generally acknowledge that bit-vectors are overly consumptive of space. Moreover, propagation occurs throughout a graph, sometimes in regions that neither affect nor care about the global solution.
The other prevalent solution uses direct-connections (i.e., data flow chains (def-def, def-use, and use-def chains)) that shorten the propagation distance between nodes that generate and use data flow information. Such solutions are typically based on def-use chains (see Aho et al., Compilers: Principles, Techniques, and Tools, Addison-Wesley (1986)). Def-use chains omit nodes from the flow graph that need not participate in the evaluation. Often and unfortunately, direct-connections require combining the same information at each use of particular variable, rather than just once. In the worst case, a quadratic number of "meets" can occur where a linear number suffices. Once established, direct connections allow propagation directly from sites that generated information to sites that use information. Although information does not propagate unnecessarily through the graph, the same information could be combined many times, whereas earlier combining would be more efficient.
Optimizing compilers typically gather compile-time invariant information about a program by posing (and solving) a series of problems, such as Common Subexpression Elimination, Invariant Detection, Constant Propagation, and Dead Code Elimination.
Consider Constant Propagation as an example of a data flow problem. Referring to the flow graph fragment shown in FIG. 1, Constant Propagation would like to prove that the assignments to w, y, and z store the constant 5. Constant Propagation must therefore prove that x is constant when each of these variables is assigned. Thus, information associated with the two definitions of x must propagate through the flow graph. At node A, the two values for x are combined. In this example, x is the constant 5 at node A. When this information reaches the assignments to w, y, and z, each assignment receives the constant 5.
An apparent inefficiency with such propagation is that information about the variable x must be propagated through nodes of the graph that do not "care" about the value of x. In fact, most implementations compute and propagate solutions by reserving a "bit" for each variable in the program. Thus, a program with 25 assignments and 100 nodes requires 100 bit vectors, each with a length of 25 bits. Since most nodes assign at most one variable, much of this information is wasted.
Consequently, many optimizing compilers construct def-use chains, which directly connect assignments to a variable with uses of the variable's value. The graph in FIG. 2 shows how such chains are constructed for our example. Now, information about the variable x can be directly forwarded to the assignments w, y, and z, avoiding all other nodes.
Unfortunately, the def-use representation shown in FIG. 2 requires that the information be combined three times (once at each node), rather than just once at node A. Such redundancy can increase analysis by an order of magnitude, as shown in FIG. 3a. There are nine def-use chains between references to x. If the information for the definitions were combined at the merge node (A), then there would be only three def-use chains to the merge node, and one chain from the merge node to each of the uses of x. In general, such an example could contain 0(n.sup.2) chains, where n is the number of nodes in the graph, without combining at the merge node. Whereas combining at the merge node yields 0(n) chains. Experiments have shown that such behavior is noticeable especially for arrays, aliased variables, and variables modified at procedure call sites. Moreover, def-use chains are themselves computed by solving two data flow problems, Reaching Definitions and Live Variables, so at some phase of analysis, bit vectors would still be required.
Recently, Static Single Assignment (SSA) form has yielded more efficient and powerful solutions for data flow problems (see Cytron et al., An Efficient Method for Computing Static Single Assignment Form, Sixteenth Annual ACM Symposium on Principles of Programming Languages, 25-35 (January 1989)). SSA form is a direct connection structure which combines the best of the two previous mechanisms. Characteristic problems solved have been constant propagation (Wegman et al., Conf. Rec. Twelfth ACM Symposium on Principles of Programming Languages, pgs. 291-299 (January 1985)), global value numbering (Bowen et al., Fifteenth ACM Principles of Programming Languages Symposium, pgs. 1-11 San Diego, Calif., (January 1988)), and invariance detection (Cytron et al., Conf. Rec. of the ACM Symp. on Principles of Compiler Construction (1986)). Once programs are cast into SSA form, data flow solutions for these problems have the following advantages: (1) information is combined as early as possible, (2) information is forwarded directly to where it is needed, and (3) useless information is not represented. These advantages follow from the way definitions are connected to uses in a program. For example, FIG. 3b shows an example of how SSA form reduces def-use chains (compared with FIG. 3a) with a special feature called .phi.-functions (described in greater detail below).
However, SSA form does have a variety of disadvantages, such as renaming or modifying portions of the program text, and not being able to efficiently handle certain types of definitions called preserving definitions. In addition, SSA form cannot handle certain type of data flow chains, such as def-def, use-def, and use-use chains. These data flow chains are extremely important for parallelizing compilers.