1. Field of the Invention
The present invention relates to optimization compilers, and in particular, to an approach using register allocation feedback in the process of allocating registers when generating optimized code.
2. Description of Related Art
One important goal of optimizing compilers is to efficiently allocate physical registers to be used when code generated by an optimizing compiler is executed. The physical registers are actual hardware registers supported by the particular platform on which the code is executed. For the case where the code can be executed in the available physical registers without conflict, the code can be directly assigned to the hardware registers, making the task of efficiently allocating physical registers quite simple. However, in many situations, the number of physical registers is insufficient to execute the code directly. In these situations, the task of efficiently allocating registers becomes more difficult.
Another important goal of optimizing compilers is to improve throughput by increasing parallelism. Parallelism refers to the degree to which instructions generated by the compiler may be executed in parallel. In general, increasing parallelism of code increases the number of physical registers needed to execute the code. Thus, the goal of increasing parallelism is at odds with the goal of efficiently allocating physical registers. To demonstrate this point, an illustration of the effect of increasing parallelism upon one approach for the allocation of registers is provided.
Contemporary optimizing compilers sometimes use a multi-pass approach to allocate physical registers. In one pass, virtual or "symbolic" registers are assigned to code. Virtual registers are sometimes considered to be infinite in number, but for practical reasons, are usually limited to some fairly large number.
During a subsequent pass, the virtual registers are assigned to the available physical registers. For situations when the number of virtual registers is less than or equal to the number of physical registers, assigning the virtual registers to the physical registers can be relatively simple. However, when the number of virtual registers exceeds the number of available physical registers, then the virtual registers must be mapped to the physical registers. In the context of compilers, mapping of virtual registers to physical registers refers to determining an assignment of virtual registers to physical registers which allows all of the computations to be performed in physical registers.
One approach for mapping a set of virtual registers to a set of physical registers is known as the graph coloring approach such as is described in U.S. Pat. No. 4,571,678 issued to Chaitin on Feb. 18, 1986. Generally, the graph coloring approach involves constructing and analyzing a register interference graph for each portion of the code. The register interference graph includes a number of nodes which represent the virtual registers. Pairs of nodes in the graph are connected by lines when two intermediate values (e.g. variables, intermediate computations) represented by nodes cannot simultaneously share a register for some portion of the program, effectively representing a conflict between the two nodes. Two intermediate values cannot simultaneously share a register, when for example, their life times overlap.
The register interference graph is then analyzed and nodes with fewer connections to other nodes than the number of available physical registers are then removed from the graph. If all of the nodes can be removed from the graph, then a coloring can be determined. That is, all of the virtual registers can be mapped to physical registers.
However, sometimes the register interference graph contains one or more nodes having more connections to other nodes than there are available physical registers. Consequently, the code for this routine cannot be executed in the available hardware registers without some of the intermediate values contained in the hardware registers being temporarily stored in memory to free up one or more hardware registers.
The process of temporarily storing data to a memory location is referred to as "spilling." Spilling data involves performing a spill operation, in which the specified data, typically a variable, is written to a temporary memory location, followed by one or more reload operations, which reload the specified data into a hardware register as the specified data is needed in the execution of the code. In terms of the register interference graph, the spilling of data is reflected in the graph and then the graph is rebuilt and analyzed again. This process is then repeated until a mapping of the virtual registers to the physical registers can be obtained.
The high level approach for mapping a set of virtual registers to a set of physical registers according to the graph coloring approach is illustrated by the flow chart of FIG. 1. After starting in step 100, a register interference graph is built in step 102. Then in step 104, the register interference graph is analyzed. As previously described, analyzing the register interference graph involves removing any nodes which have fewer connections to other nodes than the number of available physical registers.
In step 106, a determination is made as to whether the register interference graph can be colored. As previously discussed, if all of the nodes can be removed from the graph, indicating that there are no conflicts, then the graph can be colored. If so, then the process is complete in step 108. On the other hand, if in step 106 the graph cannot be colored, then in step 110, one of the virtual registers is spilled, which eliminates the spilled register as a conflicted node in the graph. In step 112, the register interference graph is rebuilt and then steps 104 through 112 are repeated until the register graph is colored.
Although spilling one or more virtual registers allows a set of virtual registers to be mapped to a set of physical registers, the approach does have some disadvantages. One disadvantage to spilling a virtual register is that additional instructions must be executed to perform the spill and reload operations. The time required to execute the additional instructions increases the overall time required to process a sequence of instructions which provide for the spilling of data. In addition, write and read operations to secondary storage mediums, such as runtime stacks, often take more time to execute than write and read operations to central processing unit (CPU) registers. Clearly, one aim of efficiently allocating registers is to reduce spilling.
Consider the example illustrated by FIG. 2A and FIG. 2B. FIG. 2A shows region 210 and the high level code associated with region 210. Region 210 is used as an example to both illustrate the graph coloring approach to the allocating of physical registers shown in FIG. 1, and to illustrating the effects of increasing parallelism upon the allocation of physical registers. Assume for purposes of illustrations, that the number of physical registers available is two.
In step 102, register interference graphs are built. In this example, FIG. 2B shows register interference graphs generated for region of code 210.
In step 104, the register interference graph is analyzed. In this example, each of the nodes in the register interference graphs shown in FIG. 2B has less than two connections, the number of available physical registers. Consequently, every node can be removed. At step 106, it is determined that the register interference graph can be colored because every node can be removed.
The Problems
As mentioned before, increasing parallelism increases the number of registers needed to execute code. Increasing the number of registers needed to execute code generally leads to increased spilling. In terms of the performance characteristics of code, the cost of increased spilling can often outweigh any benefit derived through increased parallelism. The remainder of the example illustrates this point.
Referring to FIG. 2C, assume that region 210 has now been modified to increase parallelism by an optimizing compiler. FIG. 2C represents region 210 after being modified by an optimizing compiler.
Note that code 216 and code 218 has been moved in front on code 214 to shift the order of execution. Code 216 and code 218 can be shifted in front of code 214 because neither depends on the execution of code 214. Also note that code 216, code 218 and code 212 do not depend on depend on the execution of each other. When processors capable pipelining encounter such a sequence of code, the code may be executed in pipelined parallel fashion.
Re-arranging of the execution of code so that the code may be executed in parallel is one method of increasing parallelism referred to as "scheduling." Approaches for optimizing, including methods that involve the re-arranging of code such as scheduling methods, are well known to those skilled in the art.
Referring to FIG. 1, in step 102 the register interference graph is built. FIG. 2D shows the register interference graph generated. In step 104, the register interference graph in FIG. 2D is analyzed. After removing nodes with less connections than the number of physical registers (i.e. two), node x, y, l, and m remain. Because not every node can be removed, at step 106 it is determined that the register interference graph cannot be colored. Therefore, control passes to step 112.
At step 110, the virtual register represented by node c is spilled. FIG. 2E shows region 210 after the region has been modified to spill the variable c. Specifically, code 213 is inserted to spill the data in variable c to memory. As previously mentioned, spilling a variable involves writing the variable from a physical register to a memory location, such as a run time stack.
In addition, code 219 has been inserted immediately before code 214 which causes the variable c to be reloaded into a hardware register as c'. Code 214, which depends on the value of variable c generated by code 212, then uses the variable c'.
In step 112, the register interference graph is rebuilt. FIG. 2F shows the rebuilt register interference graph. At step 104, all the nodes may be removed because every node has less connections than the number of physical registers. At step 106, it is determined that the register interference graph may be colored because all the nodes have been removed in step 104. Execution of the steps ceases.
The above example demonstrates that modifications made to increase parallelism may lead to spilling. The spilling of variable c causes the execution of several more operations involving accesses to memory, degrading the performance characteristics of the code. The benefit gained from executing code 216 and code 218 in parallel with code 212 is probably far less than the cost of spilling variable c in terms performance. It should be apparent that modifications made to increase parallelism can cause vastly more spilling than that just demonstrated. In these cases, the performance characteristics of code modified to increase parallelism can be far worse than that of code left unmodified.
One conventional approach to generating code optimizations that avoids spilling that may outweigh any benefit derived from code optimizations is the "selective optimization" approach. In the selective optimization approach, code optimizations are limited to situations based on general rules that experience has taught generally lead to improved performance. For example, under the selective optimization approach, code optimizations may be limited to loops. Because code optimizations are applied on the basis of general rules, code optimizations may be generated in situations where they may in fact cause excessive spilling thus degrading performance.
Based on the foregoing, it is clearly desirable to provide a mechanism that prevents optimizations to code that in fact degrade performance.