1. Technical Field
The present invention relates generally to object-oriented programming (OOP) and, in particular, to methods for implementing virtual bases with fixed offsets in object oriented applications.
2. Background Description
Object oriented programming (OOP) is the preferred environment for building user-friendly, intelligent computer software. The object oriented paradigm is a programming paradigm in which the world is modeled as a collection of self-contained objects that interact by sending messages. Objects are modules that contain data and all functions (code) that are allowed to be performed on the encapsulated data. Objects are defined by class (type), which determine everything about an object. Moreover, objects are considered as individual instances of a class.
Examples of OOP languages include C++, SMALLTALK, and JAVA, among others. C++ is an object oriented version of C. It is compatible with C, so that existing C code can be incorporated into C++ programs.
SMALLTALK is a pure object oriented language. In SMALLTALK, a message is sent to an object to evaluate the object itself. Messages perform a task similar to that of function calls in conventional programming languages. The programmer does not need to be concerned with the type of data. Rather, the programmer need only be concerned with creating the right order of a message and using the message.
JAVA is designed as a portable object oriented language that can run on any web-enabled computer via that computer's Web browser. As such, it offers great promise as the standard Internet and Intranet programming language. JAVA is an interpreted language that uses an intermediate language. The source code of a JAVA program is compiled into “byte code”, which cannot be run by itself. The byte code must be converted into machine code at runtime. Upon finding a JAVA applet, the Web browser switches to its JAVA interpreter (JAVA Virtual Machine) which translates the byte code into machine code and runs it. This means JAVA programs are not dependent on any specific hardware and will run in any computer with the JAVA virtual machine. For a detailed reference describing JAVA, see “The JAVA Programming Language”, K. Arnold and J. Gosling, The JAVA Series, Addison-Wesley, 1996.
There are several key elements that characterize OOP. They include virtual functions, polymorphism, and inheritance. These elements are used to generate a graphical user interface (GUI), typically characterized by a windows environment having icons, mouse cursors, and menus. While these three key elements are common to OOP languages, most OOP languages implement the three elements differently.
A virtual function is a function that has a default operation for a parent (base) class, but which can be overridden to perform a different operation by a child (derived) class. Thus, implicit in virtual function invocation is the idea that the execution of a virtual function can vary with different objects, i.e., the behavior and response that the invocation elicits will depend on the object through which the function is invoked.
Polymorphism refers to the substitutability of related objects. Objects are “related” if they have a similar “type”, and in most object-oriented languages that means that they are instances of the same class, or they have a common parent class through inheritance. Polymorphism allows this shared code to be tailored to fit the specific circumstances of each individual data type.
Inheritance lets classes be defined in terms of other classes. Thus, inheritance allows different classes to share the same code, leading to a reduction in code size and an increase in functionality. A class that inherits from another class is called a “subclass” or “child” of the other class (which is called the “superclass” or “parent class”). The subclass responds to the same messages as its parent, and it may respond to additional messages as well. The subclass “inherits” its implementation from its parent, though it may choose to reimplement some methods and/or add more data. Inheritance lets programmers define new classes easily as incremental refinements of existing ones.
There are various types of inheritance in OOP. Single inheritance corresponds to a class that has no more than one parent (base) class. Multiple inheritance corresponds to a class that can contain more than one parent. Virtual inheritance is when a base class inherited along distinct paths occurs only once in the derived class. That is, the (derived) sub-object is not replicated. Non-virtual inheritance is when the base class has multiple distinct occurrences in the derived class. That is, the (derived) sub-object is replicated.
Virtual and non-virtual inheritance are phrases employed with respect to the C++ programming language. However, such inheritances exist in other object-oriented programming languages, although they may be referred to by different phrases. For example, virtual and non-virtual inheritance correspond to shared and repeated inheritance, respectively, in the Eiffel programming language.
A brief description of multiple inheritance with respect to the C++ programming language will now be given. As noted by B. Stroustrup, in The C++ Programming Language, Addison-Wesley, 3rd Ed. (1997), the C++ syntax forces a programmer to select the kind (or semantics) of inheritance, virtual and non-virtual, when the inheritance occurs. That is, the derived class must specify whether the base class is inherited nonvirtually or virtually.
This selection forces the programmer to anticipate all possible contexts in which the classes may be further derived and allows only one choice for all of them. In the case of extendible libraries or any classes that have the potential to be further derived, the programmer is inclined therefore to conservatively specify the type of all occurrences of inheritance as virtual since no assumption of how the classes may be derived in the future are possible.
This predicament is made even greater by the non-negligible toll, both in terms of space and time resources, taken by the standard implementation of virtual inheritance in C++. This toll is further described by Ellis and B. Stroustrup, in The Annotated C++ Reference Manual, Addison-Wesley, January 1994. The representation of each object of any class must include the set of offsets to all of its virtual base classes. Although these offsets can be shared among objects of the same class by storing the offsets in class tables, time-efficient implementations will repeatedly store these offsets, usually as pointers, in each instance of the class. Furthermore, these pointers are not usually shared across virtual inheritance. The time penalty is incurred when these pointers are to be dereferenced e.g., in an upcast, a call to an inherited (even nonvirtual) member function, or in reference to data members of the virtual base. These operations require at least one indirection, and two indirections in the implementation where the offsets are stored per class and not per object.
A brief description of some of the terminology and notations used herein will now be given. Moreover, some of the various graphical notations used herein with respect inheritance hierarchies, object layout diagrams and subobject graphs are illustrated in FIG. 1. The nouns “instance” and “object” are used interchangeably, as are the verbs “inherit” and “derive”. Since the implementation of virtual inheritance in the traditional layout scheme is the same, regardless of whether it is singular or multiple, we will sometimes use the term multiple inheritance in a loose sense, to also include single virtual inheritance.
Lower case letters from the beginning and the end of the Latin alphabet, e.g., a1, b1, . . . and u1, v1, w1, x1, y1, z denote classes. In addition, u1, v1, w1, x1, Y1, z are also used for denoting variables ranging over the domain of all classes, principally in procedures and theorems. By writing x≦y we mean that either x=y or x inherits, directly or indirectly from y. We say that x is a descendant of y and that y is an ancestor or a base of x. The strict inequality x<y is used to say that x<y and x=y or, in words, x is a proper descendant of y and y is a proper ancestor of x.
Immediate (or direct) inheritance is denoted by <. Thus, x<y means that y is an immediate base of x, without specifying the kind of inheritance between x and y. To state that y is an immediate virtual (shared) base of x we write x<vy, whereas x<ry means that y is an immediate nonvirtual (repeated) base of x.
We assume that a class cannot be an immediate base of another class more than once. This assumption makes it possible to model the inheritance hierarchy of an object oriented program as a graph, rather than a multi-graph. In such a graph, which is directed and acyclic, classes are represented as nodes and immediate inheritance is represented as edges. The relationship x<y is represented by the edge (x<y) leading from the node x to the node y.
Although there are many variations to it, there is basically one common scheme for laying out C++ objects in memory. The scheme, which is hereinafter referred to as the traditional layout, is used by the vast majority of C++ compilers. Other languages that want to efficiently support multiple inheritance need a similar layout scheme.
A brief review of the traditional layout will now be given for the purpose of setting out the context in which the optimization techniques of the present invention take place. A more detailed description of the traditional layout can be found in standard textbooks such as: The Annotated C++ Reference Manual, Ellis and B. Stroustrup, Addison-Wesley, January 1994; Inside The C++ Object Model, S. B. Lippman, Addison-Wesley, second edition, 1996; and The Design and Evolution of C++, B. Stroustrup, Addison-Wesley, March 1994. The relative merits of the variants of this layout in terms of the space overhead they impose is described by P. Sweeney and M. Burke, in the above referenced article entitled “A Methodology for Quantifying and Evaluating the Space Overhead in C++ Object Models”.
With respect to implementing multiple inheritance there are two language features that incur a space (and time) overhead: virtual functions; and virtual inheritance. Virtual functions are implemented using pointers to virtual function tables, which are described hereinbelow. Virtual inheritance is implemented using pointers to virtual bases, which are also described hereinbelow.
It will be shown herein that even though the traditional approach allows some reduction in the overhead of language feature information by sharing between subobjects with repeated inheritance, the overhead can still be quite high.
A description of the pointers to virtual function tables will now be given. In essence, the traditional layout prescribes that data members are laid out “unidirectionally” in an ascending order in memory, so that the data members of each class are laid out consecutively. Also, each object or subobject belonging to a class with virtual functions has a pointer, referred to as a VPTR, which points to the virtual function table (VTBL) of this class. Let us first discuss nonvirtual inheritance. The layout of a base class precedes that of a class derived from it. The VPTR is commonly laid out at offset zero, which makes it possible for the VPTR of an object to be shared with one of its directly inherited subobjects, so there is in total only one VPTR in the case of single inheritance.
However, several VPTRs occur in the case of multiple inheritance, since an object can share a VPTR with only one of its subobjects. Consider, for example, the inheritance hierarchy depicted in FIG. 2, which is a diagram of a class hierarchy illustrating repeated inheritance (i.e., multiple subobjects of the same type may occur in an object).
In this hierarchy, class e inherits from both c and d. Accordingly, the traditional layout of objects of class e has two VPTRs, as illustrated by the object layout chart in FIG. 3.
Examining FIG. 3 we see that the subobject of class d physically encompasses that of class b, which in turn encompasses one subobject of class a. All these three subobjects share one VPTR. Similar sharing occurs between the subobject of class c and the other subobject of class a. There are two subobjects of class a since the inheritance links in FIG. 2 are nonvirtual. Finally, an object of class e does not require its own VPTR( ), but shares its VPTR( ) with that of subobjects d, b, and a.
Taking a slightly wider perspective than that of C++, and adopting Eiffel terminology let us call this repeated inheritance. The Eiffel programming language is further discussed by B. Meyer, in Object-Oriented Software Construction, Prentice-Hall, second edition, 1997. In the current example, we may say that class a is repeatedly inherited by class e. A better visual illustration of this fact is given in FIG. 4, which is the subobject graph of class e of FIG. 2. The subobject graph was first introduced by J. Rossie Jr. and D. Friedman, in “An Algebraic Semantics of Subobjects”, Proceedings of the 10th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA′95), pp. 187-199, Austin, Tex., USA, Oct. 15-19 1995 (also published in ACM SIGPLAN Notices 30(10) October 1995). This graph captures the containment relationships between subobjects. Evidently, the class a is drawn twice in this figure.
A description of the pointers to virtual bases will now be given. The traditional layout ensures that in repeated inheritance the offset of a subobject x is fixed with respect to any other encompassing subobject y irrespective of the context of y, i.e., the class of the object in which y itself occurs as a subobject. This is no longer true in the case of non-repeated inheritance, also known as shared inheritance, which is realized in C++ as virtual inheritance. The offset of a subobject of a virtual base class is context dependent. In order to locate such a subobject, be it for the purpose of data members access or an upcast, there is a virtual base pointer (or offset), referred to as a “VBPTR”, stored in each object pointing to the subobject of the virtual base class. Consider for example the inheritance hierarchy of FIG. 5, which is a diagram of the class hierarchy of FIG. 2, with shared inheritance (i.e., a class inherited along distinct paths occurs only once in an object). In FIG. 5, classes b and c are virtually derived from class a. In this case, class e has only one subobject of class a.
FIG. 6 is a subobject graph of class e of FIG. 5. This graph makes it clear that there is only one subobject of class a, which is shared between the subobjects of classes b and c.
Even though virtual inheritance is a lingual mechanism designed to support a shared variant of multiple inheritance, the C++ semantics also allow single virtual inheritance. Thus, the fact that the in-degree of a class is greater than one in a subobject graph is a necessary but insufficient condition that the class is a virtual base. This is the reason behind the notational convention of drawing a circle around names of virtual bases, as was the case with class a in FIG. 6.
FIG. 7 is a diagram of the memory layout of objects of class e of FIG. 5, which shows how VBPTRs are used to realize the sharing of a VBPTR between subobjects of classes b and d. Examining FIG. 7, we can also see that since objects of class d occupy a contiguous memory space, it must be the case that the offset of the subobject of class a with respect to the data members of d is different in objects of class d than in objects of class e. Resuming our counting of VPTRs, we see that objects of class e have in total three VPTRs: two for the immediate parents of e, c and d; and one for the subobject of the virtual base a. The VPTR of d is also shared with e and b. In contrast, the VPTR of a cannot be shared with any of its descendants, since its relative offset with respect to these is not fixed.
As explained above, the offsets to virtual base classes must be stored in memory. In the variant described above these offsets are stored as VBPTRs in each instance of the class. A time penalty is incurred when these pointers are dereferenced for e.g., an upcast, a call to an inherited (even nonvirtual) member function, or in accessing a data member of the virtual base.
Alternatively, to reduce the space overhead, virtual base offsets may be stored in class tables, frequently as special entries in the VTBL. This variant, although more space efficient in the case of many objects instantiated from the same class, doubles the time penalty since each access to members of the virtual base must pass through two levels of indirection instead of one.
It turns out that for any given class, the number of VBPTRs stored in each object in one variant is exactly the same as the number of offsets stored in the class information in the other variant. Thus, to facilitate a clear understanding of the present invention as described hereinbelow, the following description will concentrate on the “time-efficient” variant in which pointers to virtual bases are stored in objects.
The number of VBPTRs is greater than what it might appear at first since these pointers cannot be shared across virtual inheritance. To illustrate why this is so, the reader is directed to FIG. 8, which is a diagram of a class hierarchy illustrating single virtual inheritance. Each instance of class u1 has a virtual base pointer to the v1 subobject. This is also the case for instances of class v2. Now, since the inheritance link between v2 and u1 is nonvirtual, then the VBPTR to v1 can be shared by u1 and v2. Also, each instance of class u2 must store two pointers to both the v1 and the v2 subobjects which correspond to virtual bases. However, as depicted in FIG. 9, which is a diagram illustrating the memory layout of objects of class u2 of FIG. 8, the pointer to the v1 base is duplicated in a u2 instance. That is, there is one such pointer in the memory area allocated for u2's own data, but also another such pointer stored in the v2 subobject of u2.
Let us make the distinction between “essential” and “inessential” VBPTRs. The essential VBPTRs are precisely the minimal set of VBPTRs which allows direct or indirect access to every virtual subobject from any of its containing subobjects. Inessential VBPTRs are those which can be computed from the essential ones, but are stored to ensure that an upcast to an indirect virtual base takes no more time than an upcast to a direct virtual base, thus guaranteeing constant access to all data members and all virtual functions. More generally, in the traditional object layout scheme, there is no sharing across virtual inheritance links of any compiler-generated field, including VPTRs and other fields used for realizing run-time type information. Therefore, inessential VBPTRs are introduced because essential VBPTRs are not shared across virtual inheritance links.
Alternatively, to reduce space overhead in objects, inessential VBPTRs could be eliminated. This translates, in our example, to having only one VPTR to v1 that would be stored in the v2 subobject of u2. This more space efficient variant increases the time to access a virtual base subobject when a chain of VBPTRs has to be followed. In our example, if inessential. VBPTRs are eliminated, accessing the v1 subobject from the u2 object requires two levels of indirection instead of one.
FIG. 10 is a diagram of an n-chain virtual inheritance class hierarchy. As shown therein, each instance of the bottom most class in a virtual inheritance chain of n classes must include n(n−1)/2 VBPTRs in total. The situation is no different if virtual bases are stored with class information, except that the overhead is not repeated per object. The number of offsets that must be stored in total for all classes is (n3−n)/6, i.e., cubic in the number of classes in the hierarchy!
Thus, in sum, the feature of multiple inheritance in object-oriented programming languages causes a significant space and time overhead for its implementation. Accordingly, it would be desirable and highly advantageous to have methods for reducing the space and time overhead associated with implementing multiple inheritance.