1. Field of the Invention
The present invention generally relates to computer programming and, more particularly, to a method and device for optimizing a compiler to more efficiently handle big procedures, by providing cooperation between global and local register allocators.
2. Description of the Prior Art
Computer processors, whether designed for small personal computers or large mainframes, have various hardware locations referred to as registers, for storing data and computer code while a computer program is running. After a computer program is written, it must be converted from human-readable source code into machine code which can be executed by the processor. The manner in which the machine code is created, or compiled, depends among other things upon the number and types of hardware registers used in the particular processor.
"Register allocation" refers to the process within an optimizing compiler of deciding how given data items are assigned to machine registers at various times during the compiled program's execution. In the type of register allocator described herein, all eligible data items are labeled as "symbolic" registers; the compiler assumes an infinite supply of these is available. The register allocator's job is to map these symbolic registers to the finite number of registers actually available on the physical hardware. Strong register allocators that generate fast code are important for machines with expensive memory references and for code with many redundant register copy instructions.
Registers can be assigned for use both locally and globally. A basic block is a sequence of instructions which is guaranteed to execute consecutively--in other words, a sequence of instructions where the first instruction is at the start of the block and the last instruction is at the end of the block. By this definition, a program can be broken down into its constituent basic blocks. A symbolic register is considered to be local if the register is first defined and then used within only a single basic block. A symbolic register is considered global if a use of the register in one basic block is reached by a definition in another basic block (there is a flow of execution from the definition to the use along which the register is not redefined).
A local register allocator is one that assigns hardware registers to symbolic registers without analyzing the possible execution flows between basic blocks. For registers which have global lifetimes, a local allocator must store and load the register to/from memory at points where the lifetime overlaps more than one block--storing after definitions of the register, and loading the register prior to uses in blocks that are reached by definitions outside the block, even though these stores and loads may be unnecessary.
A global register allocator performs some form of liveliness analysis to determine the exact extent of each register's lifetime--finding which portions of the program a symbolic register must be assigned to a hardware register. Armed with this information, a global allocator can avoid some or all of the loads and stores that local allocators have to conservatively insert. Global register allocators typically require more computation time and space to complete register allocation than do local register allocators, due primarily to the added complexity of computing liveliness analysis.
There are various methods of optimizing register allocators. One problem with optimizing compilers is that it is sometimes not practical for the global allocator to handle procedures with arbitrarily large numbers of symbolic registers. It may be desirable to limit the number of symbolic registers that can be handled by the global allocator either because of restrictions on the size of the data structures used by the allocator in its operation, where the size of the data structures is a function of the number of symbolic registers, or because of compile time considerations. For routines that require a number of symbolic registers which is beyond a predetermined limit, a simpler and faster local allocator may be used, but this may generate slower code. Alternatively, it may be that the routine must be compiled without optimization.
A live range, as used herein, refers to a span of a program during which a particular data item is needed. The data item is defined whenever it is loaded or computed into a register, and used when the value in the register is used to compute another value or when the value in the register is stored to memory. A live range contains one or more definitions of the data item and all the intervening code between those definitions and the uses of the data items that are reached by them. For example, consider "c" in the following code segment: ##EQU1##
The variable "c" is defined along both legs of the condition statement, and then printed. Both definitions reach the use, so all three mentions of "c" are in the same lifetime. Lifetime analysis is a global dataflow technique for finding the live ranges in a program.
Two live ranges are said to interfere if it would be illegal to assign both of the respective data items to the same register (because the live ranges overlap). Two live ranges that overlap only at a register copy instruction do not interfere and can possibly be combined into a single live range. The number of live ranges that overlap at any given program point is called the register pressure at that point. The standard register allocation technique (due to Chaitin) is to build a so-called interference graph, in which each node represents a live range and there is an edge between two nodes if and only if the corresponding live ranges interfere. The process of assigning live ranges to registers in a machine with K registers is then equivalent to coloring the nodes of the interference graph using at most K colors and in a manner where no two adjacent nodes have the same color. Various heuristic techniques are used to accomplish this. See, e.g., "Register Allocation by Priority-Based Coloring" (ACM SIGPLAN 1984 Symposium proceedings, pp. 222-232), which uses priority assignments for node-coloring in global register allocation.
If the register pressure at a given program point exceeds the number of physical registers in the machine, not all live ranges can be assigned to physical registers. Therefore, one or more live ranges must be "spilled," ie., stored in a memory location instead of a register. Spilling one live range results in two or more smaller live ranges composed of subsets of the definitions and uses of the original live range. Because of this, if any spilling occurs, the live ranges must be recomputed and the interference graph rebuilt and recolored. This entire process can greatly increase compile time.
One alternative solution to this problem is to design a register allocator that minimizes the number of dynamic memory references, as discussed in the article, "Register Allocation Via Hierarchical Graph Coloring" (ACM SIGPLAN 1991 Conference proceedings, pp. 192-203). This technique results in an allocation that is sensitive to local usage patterns while retaining a global perspective. It is not clear how well this technique scales to large procedures.
It would, therefore, be desirable and advantageous to devise an optimizing compiler which can handle large procedures which have very large numbers of symbolic registers, while still generating good code with quick compile time.