The present invention is directed to compiling computer programs. It particularly concerns so-called inlining of virtual methods.
FIG. 1 depicts a typical computer system 10. A microprocessor 12 receives data, and instructions for operating on them, from on-board cache memory or further cache memory 18, possibly through the mediation of a cache controller 20, which can in turn receive such data from system read/write memory ("RAM") 22 through a RAM controller 24, or from various peripheral devices through a system bus 26.
The RAM 22's data and instruction contents will ordinarily have been loaded from peripheral devices such as a system disk 27. Other sources include communications interface 28, which can receive instructions and data from other computer systems.
The instructions that the microprocessor executes are machine instructions. Those instructions are ultimately determined by a programmer, but it is a rare programmer who is familiar with the specific machine instructions in which his efforts eventually result. More typically, the programmer writes higher-level-language "source code" from which a computer software-configured to do so generates those machine instructions, or "object code."
FIG. 2 represents this sequence. FIG. 2's block 30 represents a compiler process that a computer performs under the direction of compiler object code. That object code is typically stored on the system disk 27 or some other machine-readable medium and by transmission of electrical signals is loaded into the system memory 24 to configure the computer system to act as a compiler. But the compiler object code's persistent storage may instead be in a server system remote from the machine that performs the compiling. The electrical signals that carry the digital data by which the computer systems exchange the code are exemplary forms of carrier waves transporting the information.
The compiler converts source code into further object code, which it places in machine-readable storage such as RAM 24 or disk 27. A computer will follow that object code's instructions in performing an application 32 that typically generates output from input. The compiler 30 is itself an application, one in which the input is source code and the output is object code, but the computer that executes the application 32 is not necessarily the same as the one that performs the compiler process.
The source code need not have been written by a human programmer directly. Integrated development environments often automate the source-code-writing process to the extent that for many applications very little of the source code is produced "manually." Also, it will become apparent that the term compiler is used broadly in the discussions that follow, extending to conversions of low-level code, such as the byte-code input to the Java.TM. virtual machine, that programmers almost never write directly. (Sun, the Sun Logo, Sun Microsystems, and Java are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries.) Moreover, although FIG. 2 may appear to suggest a batch process, in which all of an application's object code is produced before any of it is executed, the same processor may both compile and execute the code, in which case the processor may execute its compiler application concurrently with--and, indeed, in a way that can be dependent upon--its execution of the compiler's output object code.
The various instruction and data sources depicted in FIG. 1 constitute a speed hierarchy. Microprocessors achieve a great degree of their speed by "pipelining" instruction execution: earlier stages of some instructions are executed simultaneously with later stages of previous ones. To keep the pipeline supplied at the resultant speed, very fast on-board registers supply the operands and offset values expected to be used most frequently. Other data and instructions likely to be used are kept in the on-board cache, to which access is also fast. Signal distance to cache memory 18 is greater than for onboard cache, so access to it, although very fast, is not as rapid as for on-board cache.
Next in the speed hierarchy is the system RAM 22, which is usually relatively large and therefore usually consists of relatively inexpensive, dynamic memory, which tends to be significantly slower than the more-expensive, static memory used for caches. Even within such memory, though, access to locations in the same "page" is relatively fast in comparison with access to locations on different pages. Considerably slower than either is obtaining data from the disk controller, but that source ordinarily is not nearly as slow as downloading data through a communications link 28 can be.
The speed differences among these various sources may span over four orders of magnitude, so compiler designers direct considerable effort to having compilers so organize their output instructions that they maximize high-speed-resource use and avoid the slowest resources as much as possible. This effort is complicated by the common programming technique of dividing a program up into a set of procedures directed to respective specific tasks.
Much of that complication results from procedures that invoke other, lower-level procedures to accomplish their work. That is, various "caller" procedures transfer control to a common, "callee" procedure in such a way that when the callee exits it returns control to whatever procedure called it. From the programmer's viewpoint, this organization is advantageous because it makes code writing more modular and thus more manageable. It also provides for code re-use: a common procedure need not be copied into each site at which it is to be used. This means that any revisions do not have to be replicated at numerous places. But such an organization also adds overhead: the caller's state must be stored, cache misses and page faults can occur, and processor pipelines often have to be flushed. In other words, the system must descend the speed hierarchy.
So optimizing compilers often "inline" short or frequently used procedures: they copy the procedure's body--without the procedure prolog and epilog--into each site at which the source code calls it. In other words, the compiler may sacrifice re-use for performance. But the programmer still benefits from code-writing modularity. Inlining has an additional important benefit: compiling the inlined procedure in a specific calling context exposes more information to an optimizing compiler and thereby allows the optimizer to generate more-efficient machine code.
Certain of the more-modem programming languages complicate the inlining process. To appreciate this, recall the basic features of object-oriented languages. In such languages, of which the Java programming language and C++ are examples, the code is written in terms of "objects," which are instances of "classes." A class's definition lists the "members" of any object that is an instance of the class. A member can be a variable. Or it can be a procedure, which in this context is typically called a "method."
FIG. 3 illustrates a way in which one may employ the Java programming language to define classes. Its first code segment defines objects of a class A as including, among other things, respective floating-point-variable members h and w and a method member m1 that (in the example) operates on the object's member variables to return a floating-point value representing their product. Every instance of class A will have its respective member variables h and w, and there will also be a method whose name is m1 that can be called on that instance (although, as will be discussed below, that method may not perform the same operation for all instances).
FIG. 4 illustrates m1's use. Being a class member, method m1 can be invoked only by being "called on" an instance of that class. So the first statement of FIG. 4's left code fragment declares variable a to contain a reference to an object of class A. It also allocates memory to an object of that class and initializes variable a with a reference to the newly allocated class A object. The two statements after that place values in that object's two member variables. The last statement passes the object reference in variable a to a procedure X.foo.
This object reference can be passed to that procedure because, as the right code fragment indicates, foo was defined as having a parameter of type A, and variable a belongs to that class. As that code indicates, foo's definition calls method m1 on its parameter o. This is legal because a method of that name is a member of class A. So when the object reference in variable a is passed to procedure foo, method m1 is performed on that object's values of variables h and w.
A central feature of such object-oriented languages is "inheritance." One can declare in a new class definition that the new class is a "child" of another, "parent" class. The extends keyword in FIG. 3's definition of class B declares that class's child relationship to class A. This means that all objects of class B are considered also to be objects of the parent class, although the reverse is not necessarily true. Since they also belong to class A, all objects of class B will include respective values of member variables h and w and can have method m1 called on them. This is true even though class B's definition does not explicitly list those members: class B inherits them from class A. So the compiler will permit an object b of class B to be passed to foo even though foo's signature requires that its parameter be a reference to an object of class A.
As described so far, the inheritance mechanism does not particularly complicate the inlining process. The compiler simply copies into foo's object code the object code that results from method m1's definition in class A, and that inlined code can be used even though foo is sometimes passed references to objects of class B.
But now consider FIG. 3's definition of class C. That definition "overrides" inherited method m1. Because it is a child of class A, class C necessarily includes a member method m1, but class C gives that method a definition different from its definition for other objects of class A. So when foo is passed an object c of class C--as is legal since class C is class A's child--the call of method m1 on object o requires code different from the code required when foo is passed an object whose class is A or B. Methods permitted to be overridden are called "virtual," and virtual methods that have been overridden are called "polymorphic." Calls to such methods are called "virtual calls," which are distinguished by the fact that the location of the called method must be computed at run time. Consequently, inlining one form of a polymorphic method can yield incorrect results.
Aggressive optimizing compilers nonetheless inline polymorphic methods in some instances. They avoid incorrect results by "guarding" the inlined code with a test to determine whether the inlined form of the method is consistent with the specific class of the "receiver" object on which the method is called. If so, a virtual call is avoided.
Still, it would be better if the called method could be inlined directly, i.e., without guard code that may direct execution to a virtual call, because such guarding both exacts a cost and deprives the compiler of certain optimization opportunities. So an optimizing compiler may search the code to determine whether there is any real possibility that a form of the callee method other than the candidate for inlining would ever be called at the call site where inlining is being considered and then inline the callee if not. Unfortunately, the compiler cannot make this determination conclusively in a dynamic-compilation environment, in which a class that overrides the callee method may be loaded after the caller is compiled and execution of the resultant code has begun.
So some workers have proposed to perform the inlining tentatively, i.e., to compile the caller under the assumption that the callee has not been overridden and then recompile the caller later if a later-loaded class overrides it. This is simple enough if the caller is not being executed when the recompilation occurs: the new compilation replaces the old, and the next invocation of the caller gets the corrected code. But things are more complicated if the caller is currently being executed when the event that necessitates the recompilation occurs. In this case, such systems have to be able to change an execution state corresponding to one compilation of a method to the "equivalent" execution state for another compilation of that method. This process has been termed "on-stack replacement."
The Self language was perhaps the first programming system to implement on-stack replacement. In that language, the compiler creates structures associated with various "deoptimization points" in the compiled code for a method. These structures contain information that enable a method's "source state," i.e., the state of the method variables as defined by the source code's interpretation, to be recovered from the "machine state" of the compiled code at the associated deoptimization point. When further compilation later results in invalidating an assumption on which the compilation depends, such as the callee's not having been overridden, that compilation must occur at such a deoptimization point. The Self system then recovers the source state, recompiles the method without the violated assumption, computes the new compilation's corresponding machine state from the source state, and replaces register values and entries in the method's stack frame to make them consistent with the new machine state.
It can be readily appreciated that providing such an on-stack-replacement capability is quite complex. Moreover, the data structures required to support it can become alarmingly voluminous. Also, maintaining deoptimization points prevents a code scheduler from re-ordering code that such deoptimization points separate.