Optimizing and parallelizing compilers perform data flow analysis to insure the correctness of their program transformations. Software development environments also utilize data flow analysis. The input to data flow analysis is a data flow framework as described in Marlowe, T. J., Data Flow Analysis and Incremental Iteration, Rutgers University (October 1989). The data flow framework includes a flow graph and a formal basis for describing the behavior and interaction of flow graph nodes (FIG. 1). The behavior of each node is formalized by its transfer function (FIG. 2), which describes how a node affects the solution as a function of the behavior of other nodes. When considered as a whole, the node transfer functions present a set of simultaneous equations, whose maximum fixed point (MFP) global evaluation provides the best computable solution at all edges or nodes of the flow graph. In other words, all other correct solutions are either uncomputable or not as precise.
A data flow framework D is defined in terms of three components. That is, D=&lt;FG,L,F&gt;, where a flow graph FG=(V,E,r) is a finite set 17 of nodes that includes a distinguished start node r (shown as node V1 in FIG. 1), and a finite set E of edges (shown as e1, e2, e3, and e4 in FIG. 1). An edge is an ordered pair (v,w) of nodes; v is the source of the edge and w its target. For example, in FIG. 1, V1, V2, V3, and V4 are nodes with V1 being the start node r. The set of edges, E, comprise e1, e2, e3, and c4. The source of e2 is 172 and its target is V3. The edges are designated by their respective ordered pair of source and target nodes, i.e., (v,w), therefore, e1=(V1, V2); e2=(V2, V3); e3=(V2, V4); and e4=(V4, V2). Where the edge (v,w) is in E, we say that v is a predecessor of w and w a successor of v. For example, in FIG. 1, V2 is a predecessor of V3 and of V4, and also a successor of 174. A sequence of edges (v.sub.1,v.sub.2),(v.sub.2,v.sub.3), . . . , (v.sub.n-1,v.sub.n) in FG is a path from v.sub.1 to v.sub.n. For example, in FIG. 1, e1, e2 is a path from V1 to V3 and e3, e4, e2 is a path from V2 to V3. If there is a path from v.sub.i to v.sub.j, we say that v.sub.i reaches v.sub.j or that v.sub.j is reachable from v.sub.i. Every node in FG is reachable from r, and r is not the target node of any edge in E. A cycle is a path for which v.sub.1 =v.sub.n. For example, in FIG. 1, the path e3,e4 forms a cycle. A "meet semilattice" is a set of elements and a partial ordering of those elements which is defined by a "meet" (.andgate.) operator. More specifically, the meet semilattice L=&lt;A,TOP,BOTTOM,&lt;, .andgate.&gt;, where A is a set whose elements form the domain of the data flow problem (i.e., the inputs and outputs associated with the flow graph nodes), TOP and BOTTOM are distinguished elements of A (symbolizing the best and the worst possible solution to the optimization problem, respectively,) &lt;is a reflexive partial order, and .andgate. is the associative and commutative "meet" operator, such that for any a,b in A, EQU a&lt;b&lt;=&gt;a.andgate.b=a EQU a.andgate.a=a EQU a.andgate.b&lt;a EQU a.andgate.TOP=a EQU a.andgate.BOTTOM=BOTTOM
Where the elements of the domain are sets, examples of meet operators are intersection and union. Where the operator is union, TOP would typically be the empty set and BOTTOM the universal set. Where the operator is intersection, TOP would typically be the universal set and BOTTOM the empty set. Intuitively, higher points in the lattice correspond to higher degrees of information.
The input and output to a node Y are elements of A. A transfer function (FIG. 2) operates on the input to a node Y to determine the output of the node Y. More specifically, F is a set of transfer functions such that F is a subset of {f:A-&gt;A}. That is, any function in F has A as its domain and its range. This set includes the identity function i (which, applied to the input of a node, produces output identical to the input), and the set is closed under composition and meet. The data flow effect of node Y is described by its transfer function f.sub.y in F. The local properties of Y are captured by its transfer function: OUT.sub.Y =f.sub.Y (In.sub.Y), where IN.sub.Y and OUT.sub.Y are in A. After a framework has been globally evaluated, each node Y has a solution OUT.sub.Y that is consistent with transfer functions at every node. In general, the best computable solution for a data flow framework is the maximum fixed convergence of the equations: EQU OUT.sub.root =TOP EQU IN.sub.Y =.andgate.( X in Preds(Y))OUT.sub.X EQU OUT.sub.Y =f.sub.Y (IN.sub.Y)
where Preds(Y) is the set of predecessors of node Y. The solution to the above equations is called the Maximum Fixed Point (MFP) solution. During an evaluation, iterations over the flow graph nodes take place until all node outputs remain unchanged. During such evaluation, IN.sub.Y travels down the lattice from TOP to the clement that represents the best computable solution prior to Y, regardless of the flow path taken.
In a forward data flow problem, for each node Y, IN.sub.Y is defined in terms of the predecessors of Y (as in the equations above). In a backward data flow problem, for each node Y, IN.sub.Y is defined in terms of the successors of Y. A data flow problem which is either forward or backward is unidirectional. A data flow problem for which IN.sub.Y for each node Y depends on both the predecessors and successors of Y is bidirectional.
The prior art describes a program in terms of a general program model that is also used by this disclosure. This program model consists of a set of one or more external procedures, where an external procedure is one that is not contained (declared) within another procedure but may contain internal procedures nested within it. One of the external procedures is the main procedure. Recursion is allowed: A procedure may directly or indirectly invoke itself.
The containment relationships among the procedures in a program P may be represented as a forest of trees F.sub.P, where the nodes of the trees represent procedures/routines. For each external procedure/routine, there is a tree in F.sub.P whose root node represents the external procedure/routine. The variables declared directly within a procedure/routine are local to the procedure/routine, while the variables declared in the ancestors of a procedure/routine in F.sub.P are global to it. The set of variables global to procedure P is denoted GLOBAL(P). Among the local variables of a procedure P are zero or more formal parameters. The set of such variables in P is denoted FORMAL(P). A variable that is either local or global with respect to a procedure P is known to P. An external variable is one that is global to all the procedures of a program. The local variables of a procedure are visible to it; its global variables that are not hidden from it are also visible. The specific mechanism for hiding is irrelevant to our method. One mechanism provided for hiding a global variable is the declaration of a local variable of the same name in an internal procedure.
The prior art includes a model for procedural interaction which is also used in this disclosure. In the model, a statement in a program that invokes a procedure is referred to as a call site. It designates a called procedure, which must be visible to the procedure containing the call site (the calling procedure). For each formal parameter of the called procedure, the call site must designate an argument that is associated with it. An argument may be a reference argument, which is a variable that is visible to the calling procedure and is passed-by-reference to its corresponding formal parameter. When the call site is invoked, a formal parameter that is associated with a reference argument assumes the same address in memory as the argument. Procedures interact at call sites through reference arguments and also through variables that are global to the called procedure. Thus a call site s is said to pass a variable X to a variable Y if and only if variable Y is the same variable as X and is global to the called procedure, or X is passed-by-reference to Y.
See FIG. 3. The interprocedural structure of a program 350 is represented by a Program Call Graph (PCG) 300, a flow graph for which each procedure is uniquely represented by a single node (301-304) and each call site by a unique edge (311-314). The start node 304 represents the main procedure. The node representing a given procedure/routine P shall be referred to as node P. The edge (P,Q) represents a call site in P that invokes Q. By the definition of a flow graph, it is assumed that every node in the call graph is reachable from the main procedure 304.
In the presence of procedure calls, data flow analysis must make worst case assumptions about the data flow effect of the call unless the analysis is interprocedural --i.e., is performed across procedure boundaries. Worst-case assumptions about interprocedural information inhibit program transformations for optimization or parallelization. Interprocedural data flow analysis algorithms have been developed for various interprocedural problems (Banning, J., Sixth Annual ACM Symposium on Principles of Programming Languages, 29-41 (January 1979); Cooper et al., SIGPLAN '88 Conference on Programming Language Design and Implementation,57-66 (June 1988).
Interprocedural data flow analysis may be either flow-sensitive or flow-insensitive. A flow-sensitive analysis makes use of the intraprocedural control flow information associated with individual procedures. A flow-insensitive analysis makes no use of intraprocedural control flow information. In ignoring control flow information, such an analysis does not have to consider the possible paths through a procedure, reducing the cost of the analysis in both space and time. In general, a flow-sensitive algorithm is more precise (i.e., higher in the semilattice) but less efficient in time and space than a flow-insensitive algorithm for the same problem.