The present invention is directed to compiling computer programs. It particularly concerns so-called inlining of virtual methods.
FIG. 1 depicts a typical computer system 10. A microprocessor 12 receives data, and instructions for operating on them, from on-board cache memory or further cache memory 18, possibly through the mediation of a cache controller 20, which can in turn receive such data from system read/write memory ("RAM") 22 through a RAM controller 24, or from various peripheral devices through a system bus 26.
The RAM 22's data and instruction contents will ordinarily have been loaded from peripheral devices such as a system disk 27. Other sources include communications interface 28, which can receive instructions and data from other computer systems.
The instructions that the microprocessor executes are machine instructions. Those instructions are ultimately determined by a programmer, but it is a rare programmer who is familiar with the specific machine instructions in which his efforts eventually result. More typically, the programmer writes higher-level-language "source code" from which a computer software-configured to do so generates those machine instructions, or "object code."
FIG. 2 represents this sequence. FIG. 2's block 30 represents a compiler process that a computer performs under the direction of compiler object code. That object code is typically stored on the system disk 27 or some other machine-readable medium and by transmission of electrical signals is loaded into the system memory 24 to configure the computer system to act as a compiler. But the compiler object code's persistent storage may instead be in a server system remote from the machine that performs the compiling. The electrical signals that carry the digital data by which the computer systems exchange the code are exemplary forms of carrier waves transporting the information.
The compiler converts source code into further object code, which it places in machine-readable storage such as RAM 24 or disk 27. A computer will follow that object code's instructions in performing an application 32 that typically generates output from input. The compiler 30 is itself an application, one in which the input is source code and the output is object code, but the computer that executes the application 32 is not necessarily the same as the one that performs the compiler process.
The source code need not have been written by a human programmer directly. Integrated development environments often automate the source-code-writing process to the extent that for many applications very little of the source code is produced "manually." Also, it will become apparent that the term compiler is used broadly in the discussions that follow, extending to conversions of low-level code, such as the byte-code input to the Java virtual machine, that programmers almost never write directly. (Sun, the Sun Logo, Sun Microsystems, and Java are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries. All SPARC trademarks are used under license and are trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.) Moreover, although FIG. 2 may appear to suggest a batch process, in which all of an application's object code is produced before any of it is executed, the same processor may both compile and execute the code, in which case the processor may execute its compiler application concurrently with-and, indeed, in a way that can be dependent upon-its execution of the compiler's output object code.
The various instruction and data sources depicted in FIG. 1 constitute a speed hierarchy. Microprocessors achieve a great degree of their speed by "pipelining" instruction execution: earlier stages of some instructions are executed simultaneously with later stages of previous ones. To keep the pipeline supplied at the resultant speed, very fast on-board registers supply the operands and offset values expected to be used most frequently. Other data and instructions likely to be used are kept in the on-board cache, to which access is also fast. Signal distance to cache memory 18 is greater than for on-board cache, so access to it, although very fast, is not as rapid as for on-board cache.
Next in the speed hierarchy is the system RAM 22, which is usually relatively large and therefore usually consists of relatively inexpensive, dynamic memory, which tends to be significantly slower than the more-expensive, static memory used for caches. Even within such memory, though, access to locations in the same "page" is relatively fast in comparison with access to locations on different pages. Considerably slower than either is obtaining data from the disk controller, but that source ordinarily is not nearly as slow as downloading data through a communications link 28 can be.
The speed differences among these various sources may span over four orders of magnitude, so compiler designers direct considerable effort to having compilers so organize their output instructions that they maximize high-speed-resource use and avoid the slowest resources as much as possible. This effort is complicated by the common programming technique of dividing a program up into a set of procedures directed to respective specific tasks.
Much of that complication results from procedures that invoke other, lower-level procedures to accomplish their work. That is, various "caller" procedures transfer control to a common, "callee" procedure in such a way that when the callee exits it returns control to whatever procedure called it. From the programmer's viewpoint, this organization is advantageous because it makes code writing more modular and thus more manageable. It also provides for code re-use: a common procedure need not be copied into each site at which it is to be used. This means that any revisions do not have to be replicated at numerous places. But such an organization also adds overhead: the caller's state must be stored, cache misses and page faults can occur, and processor pipelines often have to be flushed. In other words, the system must descend the speed hierarchy.
So optimizing compilers often "inline" short or frequently used procedures: they copy the procedure's body--without the procedure prolog and epilog--into each site at which the source code calls it. In other words, the compiler may sacrifice re-use for performance. But the programmer still benefits from code-writing modularity. Inlining has an additional important benefit: compiling the inlined procedure in a specific calling context exposes more information to an optimizing compiler and thereby allows the optimizer to generate more-efficient machine code.
Certain of the more-modern programming languages complicate the inlining process. To appreciate this, recall the basic features of object-oriented languages. In such languages, of which the Java programming language and C++ are examples, the code is written in terms of "objects," which are instances of "classes." A class's definition lists the "members" of any object that is an instance of the class. A member can be a variable. Or it can be a procedure, which in this context is typically called a "method."
FIG. 3 illustrates a way in which one may employ the Java programming language to define classes. Its first code segment defines objects of a class A as including, among other things, respective floating-point-variable members h and w and a method member m1 that (in the example) operates on the object's member variables to return a floating-point value representing their product. Every instance of class A will have its respective member variables h and w, and there will also be a method whose name is ml that can be called on that instance (although, as will be discussed below, that method may not perform the same operation for all instances).
FIG. 4 illustrates m1's use. Being a class member, method ml can be invoked only by being "called on" an instance of that class. So the first statement of FIG. 4's left code fragment declares variable a to contain a reference to an object of class A. It also allocates memory to an object of that class and initializes variable a with a reference to the newly allocated class A object. The two statements after that place values in that object's two member variables. The last statement passes the object reference in variable a to a procedure foo.
This object reference can be passed to that procedure because, as the right code fragment indicates, foo was defined as having a parameter of type A, and variable a belongs to that class. As that code indicates, foo's definition calls method ml on its parameter o. This is legal because a method of that name is a member of class A. So when the object reference in variable a is passed to procedure foo, method m1 is performed on that object's values of variables h and w.
A central feature of such object-oriented languages is "inheritance." One can declare in a new class definition that the new class is a "child" of another, "parent" class. The extends keyword in FIG. 3's definition of class B declares that class's child relationship to class A. This means that all objects of class B are considered also to be objects of the parent class, although the reverse is not necessarily true. Since they also belong to class A, all objects of class B will include respective values of member variables h and w and can have method m1 called on them. This is true even though class B's definition does not explicitly list those members: class B inherits them from class A. So the compiler will permit an object b of class B to be passed to foo even though foo's signature requires that its parameter be a reference to an object of class A.
As described so far, the inheritance mechanism does not particularly complicate the inlining process. The compiler simply copies into foo's object code the object code that results from method m1's definition in class A, and that inlined code can be used even though foo is sometimes passed references to objects of class B.
But now consider FIG. 3's definition of class C. That definition "overrides" inherited method m1. Because it is a child of class A, class C necessarily includes a member method m1, but class C gives that method a definition different from its definition for other objects of class A. So when foo is passed an object c of class C--as is legal since class C is class A's child--the call of method m1 on object o requires code different from the code required when foo is passed some other objects of class A. Methods permitted to be overridden are called "virtual," and virtual methods that have been overridden are called "polymorphic." Calls to such methods are called "virtual calls," which are distinguished by the fact that the location of the called method must be computed at run time. Consequently, directly inlining one form of a polymorphic method can yield incorrect results.
Aggressive optimizing compilers nonetheless inline polymorphic methods in some instances. They avoid incorrect results by "guarding" the inlined code with a test to determine whether the inlined form of the method is consistent with the specific class of the "receiver" object on which the method is called. FIG. 5 is a code fragment consisting of SPARC microprocessor assembly code that exemplifies such inlining.
Let us assume that a microprocessor register g0 always contains a zero value and that another microprocessor register i1 contains the address of the receiver object. Then execution of FIG. 5's first instruction places the receiver object's address into microprocessor register o0. Object-allocating code generated by a typical object-oriented compiler places a pointer to the object's class in a location at a predetermined offset from the beginning of the memory space allocated to that object. Here we assume for the sake of example that the class pointer occupies an object's first location--i.e., that the offset is zero--so execution of FIG. 5's second instruction places the pointer to the receiver object's class into microprocessor register g2. If we further assume that the address of the inlined method's class is, say, 480002C8.sub.16 (=120000.sub.16 .times.2.sup.10 +712.sub.10), then the third and fourth instructions represent loading that class's address into microprocessor register g1.
If execution of the fifth instruction reveals that the resultant contents of microprocessor registers g1 and g2 are equal, i.e., that the object's class is the same as the class in which the inlined method is defined, execution continues through the inlining, which the drawing represents with an ellipsis. The instruction after the ellipsis causes the microprocessor to jump over that method's virtual call, which begins with the second instruction after the ellipsis.
On the other hand, if execution of the fifth instruction reveals that the object's class differs from the one in which the inlined method is defined, execution of the sixth instruction results in the microprocessor's jumping over the inlining to the virtual call. Suppose that a pointer to the polymorphic method is located at a given offset--say, 92 locations--from the beginning of any class of which it is a member. Then the second instruction after the ellipsis begins the virtual call by causing the microprocessor to load the address of the receiver object's version of that method into microprocessor register g3. The instruction after that causes the microprocessor to jump to that address while leaving in register o7 the address of the jump instruction, i.e., of the address to which the callee can add eight to obtain the address to which it should return control after its execution.
We assume that the inlined method would have left its result in register l0, so FIG. 5's last instruction directs the microprocessor to take the method's return value from register o0 and place it into register l0. The result's location is thereby independent of whether the inline path was taken. Execution then continues at the same point as it does at the end of the inlined method.
So using a guard enables a compiler to give its generated code the benefits of method inlining even if the method involved is polymorphic.