1. Field of Invention
The present invention relates generally to methods and apparatus for improving the performance of software applications. More particularly, the present invention relates to methods and apparatus for enabling a register allocator to build calling convention prolog and epilog code for subroutine calls.
2. Description of the Related Art
In an effort to increase the efficiency associated with the execution of computer programs, many computer programs are xe2x80x9coptimizedxe2x80x9d during a compilation process. Optimizing a computer program generally serves to eliminate portions of computer code which are essentially unused. In addition, optimizing a computer program as a part of a compilation process may restructure computational operations to allow overall computations to be performed more efficiently, thereby consuming fewer computer resources.
An optimizer is arranged to effectively transform or otherwise compile a computer program, e.g., a computer program written in a programming language such as C++, FORTRAN, or Java bytecodes, into a faster program. The faster, or optimized, program generally includes substantially all the same, observable behaviors as the original, or pre-converted, computer program. Specifically, the optimized program includes the same mathematical behavior as its associated original program. However, the optimized program generally recreates the same mathematical behavior with fewer computations.
As will be appreciated by those skilled in the art, an optimizer generally includes a register allocator which is arranged to control the use of registers within an optimized, or otherwise compiled, internal representation of a program. A register allocator allocates register space in which data associated with a program may be stored. A register is a location associated with a processor of a computer that may be accessed relatively quickly, as compared to the speed associated with accessing xe2x80x9cregularxe2x80x9d memory space, e.g., stack space which is partitioned into stack slots, associated with a computer.
Prior to a register allocation process, a set of values, i.e., incoming arguments, are known to a compiler, and are in fixed locations as specified by a calling convention. A calling convention, as will be appreciated by those skilled in the art, is generally a convention by which calls to a subroutine are made. A calling convention typically specifies where arguments are passed, i.e., which register or stack slot each argument appears in. In addition, a calling convention may specify which registers must be preserved across the subroutine, i.e., callee-save registers. If callee-save registers are used in the subroutine, the callee-save registers generally need to be saved and restored. The calling convention may also specify whether some registers are unused or used for special purposes. Saving and restoring registers, along with any other special handling, typically occurs at the entry and exit of subroutines, and is called prolog and epilog code. Additional information is available after the register allocation process is completed. Such additional information includes, but is not limited to, the stack frame size associated with the subroutine and a set of registers, which is to be saved and restored.
FIG. 1 is a diagrammatic representation of a compiler which includes a register allocator and a calling convention code generator. Source code 102 is provided as input to a compiler 106, which may be an optimizing compiler. Typically, source code 102 includes a call 108 to a subroutine 110, as well as incoming arguments 112 associated with call 108. Specifically, the location of incoming arguments 112 is specified with respect to call 108.
A register allocator 116, which is included in compiler 106, is arranged to allocate memory space for use by source code 102. After register allocator 116 performs a register allocation, a calling convention code generator 118 generates prolog and epilog code associated with source code 102. By way of example, if any callee-save register is used in any part of the allocation, then code which is used to save and to restore the callee-save register is inserted into the prolog and epilog code. Prolog and epilog code is included in an internal representation 120 of source code 102. Once internal representation 120 is generated, compiler 106 creates machine instructions 124 from internal representation 120.
Internal representation 120 includes copy, load, and store instructions that are associated with definitions and uses of variables, in addition to a calling convention for a subroutine. As shown, variables, or values, xe2x80x9ccxe2x80x9d and xe2x80x9cdxe2x80x9d are stored on a stack. Variable xe2x80x9cdxe2x80x9d must be spilled across the subroutine call, as will be appreciated by those skilled in the art. Hence, variable xe2x80x9cdxe2x80x9d is reloaded from a stack after the subroutine call to xe2x80x9cfoo.xe2x80x9d
With reference to FIG. 2, a process of generating machine instructions from source code which includes calling conventions will be described. The process 202 generally involves the conversion of xe2x80x9cvirtualxe2x80x9d registers into xe2x80x9crealxe2x80x9d registers, as will be appreciated by those skilled in the art. Prior to allocation, the compiler assumes that it has an unlimited number of xe2x80x9cvirtualxe2x80x9d registers to work with. It is the job of the allocator to map the unlimited virtual registers into the very limited set of real registers that the overall machine has. Process 202 begins at step 204 in which calling convention code is inserted into source code obtained by a compiler.
Typically, after the compiler inserts calling convention code, or code associated with a convention by which a subroutine call may be made, the compiler studies the calling convention in step 206. Specifically, the compiler studies an incoming argument associated with the calling convention. In step 208, a determination is made as to whether the incoming value, or argument, is associated with a register or a stack location, e.g., a stack slot. When it is determined that the incoming argument is stored in a register, then process flow moves to step 216 where the incoming value is copied to a virtual register. Typically, the copying is performed using a register-to-register copy command.
Once the incoming value is copied into a virtual register, then in step 212, register allocation is performed. The steps associated with performing a register allocation will be discussed below with respect to FIG. 3. A register allocation process generates allocation choices. That is, an overall register allocation process may be used to determine how different values are assigned to registers, i.e., xe2x80x9crealxe2x80x9d registers, and stack slots. After the register allocation process is completed, then the allocation choices generated by the register allocation process are converted into machine instructions by the compiler in step 214. It should be appreciated that turning allocation choices into machine instructions includes building prolog and epilog code using information obtained during the register allocation process. The process of creating machine instructions is completed once allocation choices have been converted.
Returning to step 208 and the determination of whether an incoming value is associated with a register or a stack location, when it is determined that the incoming value is stored in a stack location, then in step 210, the incoming value is loaded into a virtual register. From step 210, process flow proceeds to step 212 where a register allocation is performed.
FIG. 3 is a process flow diagram which illustrates the steps associated with allocating stack space, i.e., step 212 of FIG. 2, in response to coloring an interference graph. The process 212 of allocating memory associated with a segment of source code begins at step 302 in which an interference graph is constructed for the segment of source code. After the interference graph is constructed, an attempt is made to color the interference graph in step 306. Typically, a variety of different methods may be applied in an attempt to color the interference graph. Once the attempt is made to color the interference graph in step 306, a determination is made in step 310 as to whether the attempt to color the interference graph was successful. In other words, a determination is made regarding whether each variable associated with the interference graph was successfully assigned to a register without conflict.
If the determination is that the attempt to color was not successful, then the implication is that not enough registers are available for each variable in the segment of source code to be assigned a register without interference. Since the number of registers in a processor is fixed, when there is no register space available for the storage of code, xe2x80x9cspill codexe2x80x9d is identified. The spill code is code that moves data to and from stack slots in an effort to reduce the number of registers that are simultaneously required, as will be understood by those skilled in the art. A stack slot is a piece of a stack frame which an allocator uses to hold information when all registers are full. Typically, an optimizer includes a specialized stack slot allocator that is arranged to allocate stack slots for spill code as needed. Stack slots for spill code are also generally needed when arguments which are beyond arguments which fit in the registers are passed on a stack.
If the determination in step 310 is that the attempt to color the interference graph was not successful, process flow moves from step 310 to step 314 in which a list of live ranges is obtained as spill candidates. That is, variables which may be spilled into stack slots are identified.
Once spill candidates are identified, then in step 318, load instructions and store instructions are assigned around definitions and uses in the segment of source code. Specifically, a load command to load a variable is inserted before a use of the variable in the segment of source code, while a store instruction to store a variable is inserted after the variable is defined in the segment of source code.
After the load instructions and store instructions, i.e., loads and stores, are assigned, a stack slot is allocated for each load and store in step 322. In general, a stack slot allocator which is separate from a register allocator is used to allocate the stack slots. While a stack slot allocator is separate from a register allocator, it should be understood that both allocators may be included in an optimizer or a compiler. Allocating the stack slots allows spill candidates to be spilled into the stack slots. From step 322, process flow returns to step 302 where an interference graph is constructed.
Returning to step 310, if the determination that the attempt to color the interference graph was successful, then the implication is that each variable has successfully been associated with either a register or a stack slot. Hence, process flow moves to step 326 in which the allocation is cleaned up, or finalized. During the cleaning of an allocation, stack slot numbers are converted to offsets into the stack frame, copies are manifested as loads or stores as required, actual register numbers are inserted into the machine instructions, and other house-cleaning chores are attended to, as will be appreciated by those skilled in the art.
The requirement of having to complete a register allocation process before a calling convention may be built, e.g., before machine instructions for a calling convention may be generated, has several shortcomings. For example, in order to generate prolog and epilog code, a special piece of code arranged to generate the prolog and epilog code must be used. Such code could contain bugs and, at a minimum, requires debugging. Further, such code is also often machine-dependent, thereby decreasing the portability of the code.
Therefore, what is desired is a method and an apparatus for efficiently generating machine instructions for a calling convention such that the machine instructions may be readily ported between different computing systems. Such a method and apparatus would further allow the spill code heuristics to choose whether or not to spill a callee-save register and remove the need for specialized prolog and epilog code generation. Specifically, what is needed is a method and an apparatus for enabling a register allocator to essentially build a calling convention.
The present invention relates to the use of a register allocator in creating a calling convention. According to one aspect of the present invention, a computer-implemented method for generating code associated with a calling convention includes obtaining compilable source code, and identifying at least one argument associated with the calling convention. The location of the argument with respect to memory space is described by a register mask. The method also includes performing a register allocation using a register allocator that is arranged to allocate registers. During the register allocation, code associated with the calling convention is produced.
In accordance with another aspect of the present invention, a computer-implemented method for building a calling convention associated with a call to a subroutine in an object-based system includes obtaining source code that is suitable for compilation, creating a plurality of register masks each having an associated variable with an associated live range, and determining an intersection of the plurality of register masks. A register allocation is performed using the intersection. The register allocation, in addition to allocating registers, generates code associated with the calling convention. In one embodiment, the method further includes converting the code associated with the calling convention into machine instructions, the machine instructions being suitable for execution by a computing system.
By allowing a calling convention to be built during a register allocation process, i.e., when a register allocator substantially automatically generates calling convention code, the calling convention may be readily characterized. The allocator may be arranged to efficiently perform an allocation. In addition, when a register allocator generates calling convention code, the source code from which the calling convention code is generated may be readily ported between different platforms.
These and other advantages of the present invention will become apparent upon reading the following detailed descriptions and studying the various figures of the drawings.