Computer software compilers take programs written in high-level programming languages (such as C and Fortran) and translate them into machine language. It is important that the compiler optimize this machine code so that it will run more efficiently. Hence, in parallel computers, compilers often also serve as a mechanism to schedule and organize a computer program so that it may be run at improved efficiency.
In order to accomplish this goal, a compiler often needs to know precise information about variables throughout the program. For example, in a for-do loop, the fact that a variable always has a constant value inside the loop would be important information for the compiler to know, as then it could reorder the placement of the loop in the schedule of statements without fear of disrupting a use of the variable later in the program.
A definition of a variable can be said to reach a use of the variable if there is a path in the control flow graph from the definition to its use that does not contain any other definitions of the variable. A compiler can find all the reaching definitions at each use by utilizing a data-flow analysis. One common technique to track this information is to create what is known as use-def chains, which are chains linking reaching definitions to each use. Creation of use-def chains is known in the art and thus will not be discussed in great detail in the present document.
Several problems, however, can occur with use-def chains and reaching definitions. First, they are not very space efficient. Reaching definitions bit-vectors can use d bits at each node in the control flow graph, wherein d is the number of definitions in the program. Additionally, use-def chains often contain redundant information. Second, the resulting information is not as precise as it could be. This is especially true when conditionals are used in the program, as if the conditional, for example, was known to be always false, this information would not be tracked anywhere in the use-def chain.
In order to solve this problem, the concept of factored use-def chains (FUD chains) was introduced. FUD chains have two important properties. First, each use of a variable is reached by a single definition. Second, control-flow merge points are handled in a special way. Merge points exist where multiple reaching definitions exist in the original program. At merge points, special merge operators called φ-terms are inserted into the program where there are multiple reaching definitions. The φ-term serves as the reaching definition for any uses after the control-flow merge, at which point it factors the reaching definitions.
Creating factored use-def chains is a three part process. First, a dominator tree is created for the program. A node X may be said to dominate node Y if all paths from entry (the path entering the block of nodes) to Y include X. This may be written as X DOM Y. A dominator tree is simply a convenient way to represent the DOM relation of a control flow graph. The dominator tree is rooted at Entry, with an edge from X to Y if X is an immediate dominator of Y. An immediate dominator is the closest strict dominator, wherein X strictly dominates Y if X DOM Y and X≠Y.
The second part of creating factored use-def chains involves placement of the φ-terms. This requires the compiler to identify the control flow graph nodes that have assignments to each variable. Additionally, the Entry node is considered to have an assignment to each variable in the program. Additionally, a slicing edge from Entry to Exit adds a φ-term at Exit for each variable that is assigned in the program.
In order to accomplish this, the compiler may execute an algorithm. This algorithm assumes the following data structures are available:
1. DF(X) is the dominance frontier for the control flow graph node X (A dominance frontier of node X is the set of nodes Z such that X dominates some predecessors of Z, but not all].
2. D(M) is the set of control flow graph nodes that contain assignments or definitions to variable M.
3. Symbols is the set of symbols or variables in the program.
Additionally, the algorithm uses the following data structures.
1. WorkList is a work list of control flow graph nodes; each node that contains an assignment or φ-term will be added to the work list.
2. Added(X) is used to determine whether a φ-term for the current variable has already been inserted at node X.
3. InWork(X) is used to determine whether node X has already been added to WorkList for the current variable.
The algorithm may be as follows:
(1) for X ∈ V do (2)InWork(X) = ⊥ (3)Added(X) = ⊥ (4)endfor (5)WorkList = Ø (6)for M ∈ Symbols do (7)for X ∈ D(M) do (8)WorkList = WorkList ∪ {X} (9)InWork(X) = M(10)endfor(11)while WorkList ≠ Ø do(12)remove some node X from WorkList(13)for W ∈ DF(X) do(14)if Added(W) ≠ M then(15)add φ-term for M at W(16)Added(W) = M(17)if InWork(W) ≠ M then(18)WorkList = Worklist ∪ {W}(19)InWork(X) = M(20)endif(21)endif(22)endfor(23)endwhile(24)endfor
The third part of creating factored use-def chains the creation of the chains themselves. This may be accomplished through a depth-first traversal of the dominator tree, starting at Entry. The algorithm assumes the following data structures or functions are available.
1. Child(X) is the set of dominator children of node X
2. SUCC(X) is the set of control flow graph successors of X
3. WhichPred(X→Y) is an index telling which predecessor of Y corresponds to the control flow graph edge from X.
Additionally, the algorithm uses the following data structures
1. CurrDef(M) is a link from the symbol table entry for variable M to the “current” definition of that variable
2. Chain(R) is a link from a use of a variable at reference R to the reaching definition or φ-term.
3. φ-Chain(R)[J] is a vector of links from a φ-term at reference R to the reaching definitions along each control flow graph predecessor.
4. SaveChain(R) is a temporary placeholder to save the old reaching definition when a new definition or φ-term is reached.
The algorithm may be as follows.
 (1)for M ∈ Symbols do (2)CurrDef(M) = ⊥ (3)endfor (4)Search(Entry) (5)procedure Search(X) (6)for each variable use or def or φ-term R ∈ X do (7)let M be the variable referenced at R (8)if R is a use then (9)Chain(R) = CurrDef(M)(10)else if R is a def or φ-term then(11)SaveChain(R) = CurrDef(M)(12)CurrDef(M) = R(13)endif(14)endfor(15)for Y ∈ SUCC(X) do(16)J = WhichPred(X → Y)(17)for each φ-term R ∈ Y do(18)let M be the variable referenced at R(19)φ-Chain(R)[J] = CurrDef(M)(20)endfor(21)endfor(22)for Y ∈ Child(X) do(23)Search(Y)(24)endfor(25)for each variable use or def or φ-term R ∈ X in reverse order do(26)let M be the variable referenced at R(27)if R is a def or a φ-term then(28)CurrDef(M) = SaveChain(R)(29)endif(30)endfor(31)end Search
The problem with these prior art algorithms is that they do not handle assert statements, or other statements in the compiler code that contain information regarding variables, for example, their values. An assert statement is generally a statement inserted into the code that identifies known information regarding a variable at a specific point in the program. Essentially, they make explicit what is normally just implicit in a program. This information could be quite helpful for the compiler to access and utilize, but currently there is no technique available to factor or otherwise organize this assert information. These assert statements could be inserted either by the user or the compiler.
What is needed is a solution that allows for a compiler to establish and handle assert statements.