1. Field of Invention
The present invention relates generally to methods and apparatus for improving the performance of software applications. More particularly, the present invention relates to methods and apparatus for allocating stack slots in substantially the same manner that is used to allocate registers.
2. Description of the Related Art
In an effort to increase the efficiency associated with the execution of computer programs, many computer programs are xe2x80x9coptimized.xe2x80x9d Optimizing a computer program generally serves to eliminate portions of computer code which are essentially unused. In addition, optimizing a computer program may restructure computational operations to allow overall computations to be performed more efficiently, thereby consuming fewer computer resources.
An optimizer is arranged to effectively transform or a computer program, e.g., a computer program written in a programming language such as C++, FORTRAN or Java bytecodes, into a faster program. The faster, or optimized, program generally includes substantially all the same, observable behaviors as the original, or preconverted, computer program. Specifically, the optimized program includes the samemathematical behavior has its associated original program. However, the program generally recreates the same mathematical behavior with fewer computations.
As will be appreciated by those skilled in the art, an optimizer generally includes a register allocator that is arranged to control the use of registers within an optimized or otherwise compiled, internal representation of a program. A register allocator allocates register space in which data associated with a program may be stored. A register is a location associated with a processor of a computer that may be accessed relatively quickly, as compared to the speed associated with accessing xe2x80x9cregularxe2x80x9d memory space, e.g., stack or heap space, associated with a computer.
The number of registers in a processor is fixed. As a result, when there is not enough register space available for the storage of data, xe2x80x9cspill codexe2x80x9d is identified. The spill code is code that moves data between stack slots and registers when all registers are full. A stack slot is a piece of a stack frame that an allocator uses to hold information when all registers are full. Typically, an optimizer includes a specialized stack slot allocator that is arranged to allocate stack slots for spill code as needed. Stack slots are also generally needed when passing more arguments than fit in the registers.
FIG. 1a is a diagrammatic representation of a segment of source code. Segment 104 of source code includes uses of variables. By way of example, an instruction 108 includes a use of a variable A which is stored in a register, e.g., register R1. Instruction 108 sets a variable B to equal the sum of variable A and an integer xe2x80x9c1xe2x80x9d. Variable B may be stored into a register R2. In addition to being used in instruction 108, variable A is used in instruction 112 as well. Variable B, as shown, is used in instruction 114.
A live range for variable B, i.e., xe2x80x9cB live rangexe2x80x9d 120, is defined as a range in segment 104 over which variable B must remain live. That is, B live range 120 is the xe2x80x9cdistancexe2x80x9d over which a value for variable B needs to be maintained in a register, e.g., register R2. xe2x80x9cA live rangexe2x80x9d 122, or the distance over which variable A must be maintained in a register, overlaps B live range 120. The overlapping live ranges 120, 122 indicate that both variable A and variable B are to remain in their respective registers simultaneously over a certain distance. As shown, a first xe2x80x9cC live rangexe2x80x9d 124 indicates that variable C is live in a register only until variable D is set. Therefore, variable C and variable D may in some cases be assigned to the same register.
An interference graph associated with segment 104 may be colored in order to assign registers to segment 104 without conflicts, e.g., without interference. The coloring, and subsequent register allocation, may be performed using a variety of different processes including, but not limited to, a Chaitin coloring heuristic developed at International Business Machines, Inc., of Yorktown Heights, N.Y. and a Briggs-Chaitin coloring algorithm, described in Register Allocation via Graph Coloring, by Preston Briggs (PhD thesis, Rice University, 1992), which is incorporated herein by reference. FIG. 1b is a diagrammatic representation of an interference graph that is associated with segment 104 of FIG. 1a. An interference graph 132 includes nodes 134 that are associated with variables A, B, C, D.
Edges 138 are included between two nodes that need to be live at the same time. As shown edge 138a is present between node A 134a and node D 134d, thereby indicating that variables A and D are alive at the same time. Similarly, the edge between node B 134b and node C 134c indicates that variables B and C also need to be live at the same time.
Interference graph 132 is arranged such that when it is successfully colored, registers may be assigned to associated nodes 134 without conflicts. Hence, coloring interference graph 132 with colors generally involves assigning colors, e.g., register numbers, to nodes 134 of interference graph 132. Interference graph 132 indicates that three registers are needed for segment 104 of source code as shown in FIG. 1a. Node A 134a and node B 134b each require individual registers, while node C 134c and node D 134d may share a register.
In general, since interference graphs may not always be colored with as few colors as the CPU has registers, a spill will occur in which some data is spilled into stack slots. By way of example, a spill may occur when two variables or values attempt to occupy a single register at any given time. When two values attempt to substantially simultaneously occupy a single register, because a register allocator has reached a stage where it is not possible to guarantee each value its own register, one of the values must be spilled into a stack slot. The identification of a value that may be spilled into a stack slot is considered to be the identification of a spill candidate. The register allocator attempts to assign colors to the interference graph such that no two nodes connected by an edge have the same color. Further, the register allocator attempts to use no more than k colors, where k is the number of registers in the central processing unit (CPU), i.e., 8 on Intel 80xc3x9786 CPUs and 32 on most RISC CPUs. When it is not possible, or when the algorithm used to color the interference graph does not find a k coloring, then some live ranges must be spilled.
For a hypothetical 2-register machine, interference graph 132 of FIG. 1b may not be colored. For example, an assumption may be made that live ranges associated with variables A and B are identified as spill candidates. A register allocator inserts stores and loads around definitions and uses, as shown in FIG. 1c. At the same time, stack slots must be allocated for use in storing spill code. In this example, separate stack slots are used for spilling live range A and live range B, yet only one of those two live ranges is ever alive at the same time. The interference graph for spilled program 104xe2x80x2 is given in FIG. 1d. Interference graph 180 of FIG. 1d may be colored using only 2 colors, e.g., machine registers.
The use of store and load instructions allows values to be stored and retrieved, as will be appreciated by those skilled in the art. Further, the use of store and load instructions is associated with the allocation of stack space, or, more specifically, stack slots. FIG. 2 is a process flow diagram which illustrates the steps associated with allocating stack space in response to coloring an interference graph. The process of allocating memory associated with a segment of source code begins at step 202 in which an interference graph, e.g., interference graph 132 of FIG. 1b, is constructed for the segment of source code.
After the interference graph is constructed, an attempt is made to color the interference graph in step 206. As previously discussed, a variety of different methods may be applied in an attempt to color the interference graph. Once the attempt is made to color the interference graph in step 206, a determination is made in step 210 as to whether the attempt to color the interference graph was successful. In other words, a determination is made regarding whether each variable associated with the interference graph was successfully assigned to a register without conflict.
If the determination is that the attempt to color was not successful, then the implication is that not enough registers are available for each variable in the segment of source code to be assigned a register without interference. Accordingly, process flow moves from step 210 to step 214 in which a list of live ranges is obtained as spill candidates. That is, variables that may be spilled into stack slots are identified.
Once spill candidates are identified, then in step 218, load instructions and store instructions are assigned around definitions and uses in the segment of source code. Specifically, a load command to load a variable is inserted before a use of the variable in the segment of source code, while a store instruction to store a variable is inserted after the variable is defined in the segment of source code. After the load instructions and store instructions, i.e., loads and stores, are assigned, a stack slot is allocated for each live range in step 222. In general, a stack slot allocator which is separate from a register allocator is used to allocate the stack slots. While a stack slot allocator is separate from a register allocator, it should be understood that both allocators might be included in an optimizer or a compiler. Allocating the stack slots allows spill candidates to be spilled into the stack slots. From step 22, process flow returns to step 202 where a new interference graph is constructed.
Returning to step 210, if the determination that the attempt to color the interference graph was successful, then the implication is that each variable has successfully been associated with either a register or a stack slot. Hence, process flow moves to step 226 in which the stack containing stack slots is cleaned. Cleaning the stack slots includes a series of relatively simple steps, as will be understood by those of skill in the art. Such steps typically include converting stack slot references into actual offsets and placing the offsets into the associated spill instructions.
If a register allocator simply assigns a single stack slot per spill candidate, it will generate stack frames that are lightly used. Large frames, such as those which are not dense, consume memory, as well as data cache, without significant gain. Large frames are also associated with problems on machines that cannot directly access large offsets from a stack pointer. By way of example, Sparc computers require a second instruction to access stack slots which are located more than 4096 bytes away. Typically, allocators attempt to reuse stack slots in order to minimize frame size. Heuristics that are often implemented to reuse stack slots generally behave in an unpredictable manner, thereby leading to unreliable, e.g., bug-filled, code, as will be appreciated by those skilled in the art.
The implementation of a stack slot allocator is often inefficient, leading to stack frames filled with stack slots which are generally unused over large portions of the program. This causes the stack frames to be unnecessarily large, requiring large amounts of memory and, as a result, slowing the execution of a program. Additionally, the heuristics associated with the implementation of a stack slot allocator, e.g., attempts to reuse stack slots, operate in an ad-hoc manner.
Therefore, what is desired is an efficient method for handling values that are stored in stack slots. Specifically, what is needed is an efficient method and apparatus for allocating and using stack space such that the allocation and the use of the stack space is substantially the same as the allocation and the use of register space.
The present invention relates to allocating and using stack space. According to one aspect of the present invention, a computer-implemented method for allocating stack space in an object-based system includes obtaining source code that is suitable for compilation and includes a definition associated with a variable. Once the source code is obtained, a first copy instruction is inserted into the source code sequentially after the definition associated with the variable. Then, a first stack slot is allocated for the first copy instruction, and the first stack slot is associated with a stack frame such that the size of the stack frame is determined. In one embodiment, the method further includes creating an interference graph associated with the source code, attempting to color the interference graph, and determining if the attempt to color the interference graph is successful. If the coloring attempt is not successful, then the first copy instruction is inserted in the source code.
By inserting a copy instruction, which may be associated with a load, a store, or a register-register copy, around definitions and uses of variables, stack slots may be allocated using the same mechanisms that are used to allocate registers. Using the same mechanisms to allocate registers and stack slots enables complications associated with assigning stack slot values using a generally complex, separate mechanism to be avoided. Hence, source code that uses stack slot allocation that is performed using the same mechanisms that are used to allocate registers may generally execute more efficiently and more reliably.
These and other advantages of the present invention will become apparent upon reading the following detailed descriptions and studying the various figures of the drawings.