Optimizing and parallelizing compilers perform data flow analysis to insure the correctness of their program transformations. Software development environments also utilize data flow analysis. The input to data flow analysis is a data flow framework as described in Marlowe, T. J., Data Flow Analysis and Incremental Iteration, Rutgers University (October 1989). The data flow framework includes a flow graph and a formal basis for describing the behavior and interaction of flow graph nodes (FIG. 1 ). The behavior of each node is formalized by its transfer function (FIG. 2), which describes how a node affects the solution as a function of the behavior of other nodes. When considered as a whole, the node transfer functions present a set of simultaneous equations, whose maximum fixed point (MFP) global evaluation provides the best computable solution at all edges or nodes of the flow graph. In other words, all other correct solutions are either uncomputable or not as precise.
A data flow framework D is defined in terms of three components. That is, D=&lt;FG,L,F&gt;, where a flow graph FG=(F,E,r) is a finite set V of nodes that includes a distinguished start node r (shown as node V1 in FIG. 1), and a finite set E of edges (shown as e1, e2, e3, and e4 in FIG. 1). An edge is an ordered pair (v,w) of nodes; v is the source of the edge and w its target. For example, in FIG. 1, V1, V2, V3, and V4 are nodes with V1 being the start node r. The set of edges, E, comprise e1, e2, e3. and e4. The source of e2 is V2 and its target is V3. The edges are designated by their respective ordered pair of source and target nodes, i.e., (v,w), therefore, e1=(V1, V2); e2=(V2, V3); c3=(V2, V4); and e4=(V4, V2). Where the edge (v,w) is in E, we say that v is a predecessor of w and w a successor of v. For example, in FIG. 1, V2 is a predecessor of V3 and of V4, and also a successor of V4. A sequence of edges (v.sub.1,v.sub.2),(v.sub.2,v.sub.3), . . . ,(v.sub.n-1,v.sub.n) in FG is a path from v.sub.1 to v.sub.n. For example, in FIG. 1, e1, e2 is a path from V1 to V3 and e3, e4, e2 is a path from V2 to V3. If there is a path from v.sub.i to v.sub.j, we say that v.sub.i reaches v.sub.j or that v.sub.j is reachable from v.sub.i. Every node in FG is reachable from r, and r is not the target node or any edge in E. A cycle is a path for which v.sub.1 =v.sub.n. For example, in FIG. 1, the path e3,e4 forms a cycle.
A "meet semilattice" is a set of elements and a partial ordering of those elements which is defined by a "meet" (.andgate.) operator. More specifically, the meet semilattice L=&lt;A,TOP,BOTTOM,&lt;, .andgate.&gt;, where A is a set whose elements form the domain of the data flow problem (i.e., the inputs and outputs associated with the flow graph nodes), TOP and BOTTOM are distinguished elements of A (symbolizing the best and the worst possible solution to the optimization problem, respectively,)&lt; is a reflexive partial order, and .andgate. is the associative and commutative "meet" operator, such that for any a,b in A,
a&lt;b&lt;=&gt;a.andgate.b=a PA1 a.andgate.a=a PA1 a.andgate.b&lt;a PA1 a.andgate.TOP=a PA1 a.andgate.BOTTOM=BOTTOM
Where the elements of tire domain are sets, examples of meet operators are intersection and union. Where the operator is union, TOP would typically be the empty set and BOTTOM the universal set. Where the operator is intersection, TOP would typically be the universal set and BOTTOM the empty set. Intuitively, higher points in tire lattice correspond to higher degrees of information.
The input and output to a node Y are elements of A. A transfer function (FIG. 2) operates on the input to a node Y to determine the output of the node Y. More specifically, F is a set of transfer functions such that F is a subset of {.function.:A-&gt;A}. That is, any function in F has A as its domain and its range. This set includes the identity function i (which, applied to the input of a node, produces output identical to the input), and the set is closed under composition and meet. The data flow effect of node Y is described by its transfer function .function..sub.y in F. The local properties of Y are captured by its transfer function: OUT.sub.Y =.function..sub.Y (In.sub.Y), where IN.sub.Y and OUT.sub.Y are in A. After a framework has been globally evaluated, each node Y has a solution OUT.sub.Y that is consistent with transfer functions at every node. In general, the best computable solution for a data flow framework is the maximum fixed convergence of the equations: EQU OUT.sub.root -TOP EQU IN.sub.Y =.andgate.( X in Preds(Y)OUT.sub.X EQU OUT.sub.Y =.function..sub.Y (IN.sub.Y)
where Preds(Y) is the set of predecessors of node Y. The solution to the above equations is called the Maximum Fixed Point (MFP) solution. During an evaluation, iterations over the flow graph nodes take place until all node outputs remain unchanged. During such evaluation, IN.sub.Y travels down the lattice from TOP to the element that represents the best computable solution prior to Y, regardless of the flow path taken.
In a forward data flow problem, for each node Y, IN.sub.Y is defined in terms of the predecessors of Y (as in the equations above). In a backward data flow problem, for each node Y, IN.sub.Y is defined in terms of the successors of Y. A data flow problem which is either forward or backward is unidirectional. A data flow problem for which IN.sub.Y for each node Y depends on both the predecessors and successors of Y is bidirectional.
The prior art describes a program in terms of a general program model that is also used by this disclosure. This program model consists of a set of one or more external procedures, where an external procedure is one that is not contained (declared) within another procedure but may contain internal procedures nested within it. One of the external procedures is the main procedure. Recursion is allowed: A procedure may directly or indirectly invoke itself.
The containment relationships among the procedures in a program P may be represented as a forest of trees F.sub.P, where the nodes of the trees represent procedures/routines. For each external procedure/routine, there is a tree in F.sub.P whose root node represents the external procedure/routine. The variables declared directly within a procedure/routine are local to the procedure/routine, while the variables declared in the ancestors of a procedure/routine in F.sub.P are global to it. The set of variables global to procedure P is denoted GLOBAL(P). Among the local variables of a procedure P are zero or more formal parameters. The set of such variables in P is denoted FORMAL(P). A variable that is either local or global with respect to a procedure P is known to P. An external variable is one that is global to all the procedures of a program. The local variables of a procedure are visible to it; its global variables that are not hidden from it are also visible. The specific mechanism for hiding is irrelevant to our method. One mechanism provided for hiding a global variable is the declaration of a local variable of the same name in an internal procedure.
The prior art includes a model for procedural interaction which is also used in this disclosure. In the model, a statement in a program that invokes a procedure is referred to as a call site. It designates a called procedure, which must be visible to the procedure containing the call site (the calling procedure). For each formal parameter of the called procedure, the call site must designate an argument that is associated with it. An argument may be a reference argument, which is a variable that is visible to the calling procedure and is passed-by-reference to its corresponding formal parameter. When the call site is invoked, a formal parameter that is associated with a reference argument assumes the same address in memory as the argument. Procedures interact at call sites through reference arguments and also through variables that are global to the called procedure. Thus a call site s is said to pass a variable X to a variable Y if and only if variable Y is the same variable as X and is global to the called procedure, or X is passed-by-reference to Y.
See FIG. 3. The interprocedural structure of a program 350 is represented by a Program Call. Graph (PCG) 300, a flow graph for which each procedure is uniquely represented by a single node (301-304) and each call site by a unique edge (311-314). The start node 304 represents the main procedure. The node representing a given procedure/routine P shall be referred to as node P. The edge (P,Q) represents a call site in P that invokes Q. By the definition of a flow graph, it is assumed that every node in the call graph is reachable from the main procedure 304.
The data flow analysis of a procedure is interprocedural if it is performed across procedure boundaries. Interprocedural data flow analysis algorithms have been developed for various interprocedural problems.
Aliases occur when two or more access paths refer to the same storage location. An access path is an 1-value expression which is constructed from variables, pointer indirection operators, and field select operators. Static aliases occur due to the FORTRAN EQUIVALENCE or C union construct and are constant for the duration of the program execution. Static alias information is typically determined during the semantic phase of compilation, and is not further considered here. Dynamic aliases arise during program execution. Program constructs such as the FORTRAN reference parameter mechanism and pointers induce dynamic aliasing. Two access paths are may-aliases at a point p in a program if they refer to the same storage location in some execution instances of p. This section describes the determination of may-aliases. May-aliases are referred to as aliases, whenever the meaning is clear from context.
A dynamically allocated storage location is frequently referred to as an anonymous object. The term named object is used to refer to a memory location associated with a name. The naming of memory locations (including those that are dynamically allocated) is required for the correctness of data flow analysis. A pointer expression is always associated with at least one named object to which it is aliased. The named object, instead of the access path itself, is used for data flow analysis.
Alias information can be regarded as exhaustive when it contains all explicit alias relations holding at each statement. However, exhaustive alias information holding at each statement is rarely needed. In most cases for a statement, alias relations of only those access paths referenced at that statement are needed, and having exhaustive information for all the access paths at each statement incurs unnecessary time and space cost.
Pointer-induced alias relations determine a directed graph. Each named object corresponds to a unique node in the directed graph. An alias&lt;*p,a&gt; implies that there is a (de-referencing) edge from node p to node a. Likewise, &lt;**q,b&gt; implies that there exists an object c such that there is an edge from q to c and one from c to b.