Register allocation is a classic problem in computer science. The assignment of registers to variables is difficult and typically takes a long time because there are a limited number of (very fast access) registers. Variables can be stored in (much slower) memory instead of in registers, but doing so typically adversely affects run-time performance of the program. Hence optimal register allocation is considered a very difficult computer science problem and can take a very long time.
For example, assume that a computer has two registers and a user writes a program using three variables, a, b, and c. Suppose the lifetimes of variables a and c do not overlap, that is, the lifetime for a has no instructions in common with the lifetime for c. Suppose that the lifetime of b overlaps the lifetimes of both a and c. It is clear that b cannot be assigned the same register assigned to either a or c but that a and c can be assigned the same register because a and c do not conflict.
Hence, it is clear that the concept of the lifetime of variables are important when allocating registers in an efficient way. Registers can be allocated to variable lifetimes by: (1) building a conflict graph which identifies which variables cannot be assigned to the same register simultaneously; (2) assigning registers to variables until the registers are used up; (3) splitting the lifetime of some of the variables by “spilling” the contents of the registers into main memory so that registers can continue to be assigned to variables; and (4) returning to step (2). The process of assigning registers to variables is called register coloring.
Another problem encountered in an optimizing compiler is acquiring information about the structure and nature of the program. This problem has been addressed by associating every use of a variable in a program with a single unique definition for the variable by constructing a Static Single Assignment (SSA) form of the program. When a program has been put in SSA form, each use of a variable in the program will have a pointer associated with it that points to the single unique definition of the variable. These pointers are typically “use-def links” or “use-def edges.”
For example, given the following simple program, Program 1:
x=
y=
z=x+y
z=z+x
the SSA form might be represented as illustrated in FIG. 1.
When more than one definition for a given variable exists, phi functions are inserted. For example, suppose the following simple program, Program 2 exists:
If (some conditional C)
{
then x=
}
else
{
x=
}
=x
endif
This program can be represented in flowgraph form, Flowgraph 1, as shown in FIG. 2.
In Flowgraph 1, block 1 represents whatever code preceded the If statement. As a result of the If statement, control will branch to either block 2 or block 3. In block 2 there is a definition of x (x=?) and in block 3 there is a definition of x (x=?). As the program is written, then, we cannot draw in an edge from the use of X in block 4 (=x) to its “single unique definition”. To overcome this problem, a phi function: x3=σ(x,x) is inserted in block 4. X (in block 2) is renamed x1 and x (in block 3) is renamed x2 In this way every x has associated with it just one definition. It will be seen that there are as many inputs to a phi function as there are control flow edges that allow entrance into a block. In this case because there are two pathways to enter block 4 (via block 2 or via block 3), so there are two inputs to the phi function. The end result (Flowgraph 2)_showing the control flow edges as well as the use-def edges is shown in FIG. 3.
A dominance frontier is a property of a block in a flowgraph. For example, consider the following simple program, represented in flowgraph form in Flowgraph 3 shown in FIG. 4.
A block (e.g., block 1) is said to dominate another block (e.g., block 2) if and only if all possible paths from the entry block (e.g., block 0) to block 2 include block 1. Here you used to have the “preceding block” which was correct in this case but only because of the coincidence that the predecessor block was the entry block. In Flowgraph 3, block 1 dominates blocks 2, 3 and 4. By definition, a block also dominates itself. A block (e.g., block 1) is said to strictly dominate another block (e.g., block 2) if and only if block 1 dominates block 2 and block 1 is not block 2. Predecessors of a block are all the blocks that have edges going to it. For example, in the above diagram, block 4's predecessors are blocks 2 and 3, and block 1's predecessor is block 0 and so on. A dominance frontier of a block B (denoted df(B)) is the set of all blocks (b) such that B dominates a predecessor of b, and B does not strictly dominate b., or:
df (B)={b:B dom pred (b) & B does not strictly dom b}.
In Flowgraph 3, block 4 is a member of the dominance frontier of block 3. In this case B=block 3 and b=block 4. Block 3 is a predecessor of block 4 and block 3 dominates itself (by definition) so block 3 dominates a predecessor of 4. Block 3 does not strictly dominate block 4 because block 4 can be reached by going through block 2 (thereby circumventing block 3) therefore block 4 is a member of the dominance frontier of 3.
Dominance frontiers are useful to indicate where phi functions should be inserted. For example, if there were a definition (e.g., x=) in a block (e.g., block 3) in Flowgraph 3, a phi function (e.g., x=σ(x,x)) should be placed in the blocks associated with the dominance frontier of block 3 (e.g., a phi function x=σ(x,x) should be placed in block 4). Flowgraph 4, shown in FIG. 5, illustrates Flowgraph 3 with the addition of definitions of x and the inserted phi function.
To construct the SSA form, first the dominance frontier is constructed for all the blocks (referred to as generating an iterated dominance frontier). Then by inspection from the location of all the definitions, the phi functions are placed. Finally, the control flowgraph edges are added. To accomplish this, the flowgraph is traversed in a depth-first order, as illustrated in FIG. 6.
To traverse Flowgraph 5 in depth-first order, first block 0 is traversed, and then block 0's descendant, block 1, is traversed. Block 1 has two descendants, block 2 and block 5. One of the descendants is randomly selected (e.g., block 2), one of block 2's descendants is randomly selected (e.g., block 3), one of block 3's descendants is randomly selected (e.g., block 4). At this point there are no more descendants so traversal continues at the predecessor block (e.g., block 3), but there are no more unvisited descendants. Traversal continues at block 3's predecessor, so block 2 is traversed, but there are no more unvisited descendants. Traversal continues at block 2's predecessor (e.g., block 1). Block 1 had another predecessor (e.g. block 5) which has not been traversed yet. Block 5 is traversed. Block 5 has no unvisited descendants so the traversal continues to block 1. Block 1 has no unvisited descendants, so traversal continues to block 0. During this traversal, whenever a definition of a variable is encountered, the variable is pushed onto a renaming stack associated with that variable. When block 1 is traversed, a definition is found (labeled x1) and is placed onto a renaming stack (currently empty). Block 2 has no definitions, but has a use (=x). By definition, the use is the current top of the stack so an edge is added (e.g., edge a). Block 3 contains another definition of x (a phi function, labeled x2). Because block 3 contains a phi function, and only one definition of x (x1) has been encountered so far, the first phi function parameter is wired up to the x in block 1 (edge b). The phi function is labeled x2 and is placed onto the renaming stack. Block 4 contains neither a definition nor a use. Traversal proceeds upwards to block 1 and then down to block 5. In block 5 a new definition is encountered (e.g., X3), which is pushed on the stack. Also in block 5 a use (=x) is found and is wired up (edge c) to the definition of x on the top of the stack (x3). Upon returning to block 3, the second x in the phi function is wired up to x3 (edge d).
Hence, SSA construction is typically broken down into the following sequence of steps: first an Iterated Dominance Frontier (IDF) is constructed; then the IDF is used to inject phi-functions where necessary into the instruction stream; and finally, the variables in the program are renamed with an array of renaming stacks while a pre-order depth-first recursive walk of the flow graph is performed. Because each block is processed once, the amount of work performed is a function of the number of blocks, so that SSA construction takes place in linear time.
In traditional compilers, the time expended to transform user code into machine code is not critical. Typically, it is more important to emit the best possible resultant code (e.g., making the resultant code run 10% faster) than to emit the code as fast as possible. In the world of just-in-time (JIT) compilers, in the JAVA run-time world, in IBM's run-time world, in the .NET run-time world and so on, however, compilation occurs while a user is running an application or program, hence trade-offs have to be made between the amount of time spent performing analysis and optimization and the expected improvement in run-time performance of the transformation. It would be helpful if multiple phases of compilation, such as register allocation and SSA construction, could be combined into a single phase without adding complexity so that optimally, more could be accomplished in less time, thus enabling both an increase in analysis/optimization throughput and run-time performance.