1. Technical Field
This invention pertains to hierarchical structures in object oriented languages. More particularly, it pertains to the growth of hierarchical structures with new virtual base classes while preserving release-to-release binary compatibility (RRBC).
2. Prior Art
Release-to-release binary compatibility (RRBC) is the ability for client code to continue to operate without recompilation, even when shared libraries upon which such code depends are updated with newer versions.
RRBC problems in the C++ programming language are introduced when information about a class is compiled into client code, such as the offset and location of class members, instance size, and the offset to parent class data. As a result, a simple change to a class, such as adding a new member or adding a new base class, may require recompilation of any derived class or client code.
There is a requirement for class libraries to allow adding new base classes in a class hierarchy without impacting RRBC. There is also a requirement to allow adding a new virtual base only on the non-leftmost path of a class hierarchy without impacting RRBC and the performance of single or leftmost path inheritance. There is also a requirement to minimize the cost to performance ratio in any solution to these requirements.
From the experience of class library designers, a class hierarchy can never be right in the first couple of releases without extensive use by its customers. Changes to class hierarchies are likely to happen in subsequent releases, especially when adding new bases. In the current design of most class libraries, use of non-virtual and single inheritance is far more common than virtual and multiple inheritance. The performance of virtual and multiple inheritance discourages class designers from using it. So to them, allowing the addition of non-virtual base classes in single inheritance without impacting RRBC is required and they need to be able to add the new classes at the end or in the middle of the hierarchy. To satisfy this requirement from designers of class libraries is possible but comes with a performance penalty.
Accessing Data in a Non-virtual Base Class
Current inheritance implementations, such as the IBM(copyright)Visual Age(copyright) and Taligent C++, forego all indirection, except for accessing a virtual base; that is, the data members of a base class subobject are directly stored within the derived class object. Access of a data member requires the addition of the beginning address of the class object with the offset location of the data member.
The offset is known at compile time even if the member belongs to a base class subobject derived through a single or multiple inheritance chain. This offers the most compact and most efficient access of non-virtual base class members.
In Table 1, the move statement moves a xe2x80x9cvaluexe2x80x9d to the address of a data member in a base class. The address of the data member is obtained by addition of the xe2x80x9cthisxe2x80x9d pointer which points to the beginning of the object to the offset of the member within the object. xe2x80x9cobjxe2x80x9d and xe2x80x9cthisxe2x80x9d are here used interchangeably, and refer to the beginning address of an object. [this+member_offset] refers to the address of the data member.
Accessing Data in a Virtual Base Class:
Currently, IBM VisualAge C++ access to a virtual base class subobject is done through a virtual base pointer located inside the class object. The location of the virtual base pointer is fixed and known during compile time so an extra level of indirection is required to access data in the virtual base.
Instead of using virtual base pointer, currently Taligent C++ access to a virtual base class subobject is done through a virtual function table VFT pointer. If a given class directly or indirectly inherits from a virtual base class, the VFT (also referred to as vtable) of that given class contains offsets to find the virtual base subobjects. Use of virtual base offsets results in more instructions to do virtual base accesses, but smaller object size or less initialization time is needed during the program startup time.
In Table 2, the first move statement moves the virtual base pointer (vbp) to register eax. The location of the vbp is obtained by addition of the xe2x80x9cthisxe2x80x9d pointer to the offset of the virtual base pointer within the object. The next move statement moves a xe2x80x9cvaluexe2x80x9d to the member of the virtual base. The address of the data member is obtained by addition of the vbp to the offset of the member within the virtual base class. The first move statement denotes the extra memory access which is not required in the non-virtual case.
In Table 3, the first move statement moves the virtual function table (VFT) address to register eax. This is the first extra memory access. The VFT address is stored in the memory pointed to by the xe2x80x9cthisxe2x80x9d pointer. The second move statement moves the offset of the virtual base to register eax from an index to the VFT. This is the second extra memory access. The add statement gets the address of the virtual base class in register eax by adding the xe2x80x9cthisxe2x80x9d pointer to the offset of the virtual base within the object. The last move statement moves the value to the data member address. The data member address is obtained by adding the member offset to the address of the virtual base.
Supporting Addition of New Bases:
To support adding new base classes, both virtual and non-virtual, the offset locations of the base within the object are no longer fixed and known during compile time. One way to solve the problem is to introduce a base class table to keep track of the offset or address of an associated base class and use an extra level of indirection to access base class members. This is similar to accessing a virtual base but the table has to be completed at run-time to achieve RRBC.
In Table 4, the first move statement is the first extra memory load and the second move statement is the second extra memory load. In the normal case, both IBM VisualAge and Taligent C++ do not require any memory load in accessing a data member in a non-virtual base.
Thus, in accessing data in a virtual or non-virtual base, two extra memory loads are required compared to direct data access. Since the majority of time spent in executing most applications is spent on accessing data, extra memory loads slow down the program significantly.
Calling a Virtual Function in a Base Class
The IBM VisualAge C++ compiler currently uses a general virtual function implementation model. That is, the virtual function is invoked through the virtual function table where the address of the function is stored. This is illustrated in Table 5.
An xe2x80x9cadjustor thunkxe2x80x9d is a small piece of code which is used to calculate the address of a calling virtual function.
Currently, the Taligent C++ implementation for calling virtual functions uses a class segment table in the VFT to introduce an extra level of indirection. This is illustrated in Table 6, illustrating a procedure which allows addition of virtual functions without recompiling client code.
In Table 6, the second move statement is the extra memory access.
The example of Table 7 illustrates how the address of a virtual function of a virtual base class is obtained from a derived class. The first move statement moves the content of the xe2x80x9cthisxe2x80x9d pointer which is the address of the derived class""s VFT to register eax. This move statement is the first extra memory access. Inside the VFT of the derived class, an array of virtual base offsets is found. The second move statement moves a virtual base offset from an index to the VFT to register eax. The add statement adds the xe2x80x9cthisxe2x80x9d pointer to the virtual base offset in eax and stores the result back to eax which now contains a pointer to the virtual base. The next move statement moves the content of the virtual base pointer which is the address of the VFT of the virtual base class to register ecx. Inside the VFT of the virtual base class, a class segment table is found. Each entry of the class segment table contains a pointer to a virtual functions table that the class introduces. The last move statement moves the address of the virtual functions table in register ecx by indexing to the class_segment table. This last move statement is the second extra memory access. The index to the virtual functions table is the address of the virtual function which the jump statement uses to transfer to the virtual function.
When extending the Taligent C++ model to support adding new base classes, the index of a class inside the class segment table is no longer known during compile time so another level of indirection is needed. Similar to supporting data access, a base table may be employed, which is completed during the runtime, to keep track of the class index inside the class segment table and the base offset.
In Table 8, the first move statement moves the address of the base table to ecx. The second move statement moves the base offset from an index to the base table to ecx. The base pointer is obtained by adding the base offset to the xe2x80x9cthisxe2x80x9d pointer. The third move statement moves the virtual functions list address from an index to the class segment table in register eax and this is the second extra memory access. The index to the virtual functions list is the address of the virtual function which the jump statement uses to transfer to the virtual function.
Compiler support for multiple and virtual inheritance is expensive. Multiple inheritance is neither as well behaved nor as easily modeled as single inheritance. And the complexity is in the xe2x80x9cunnaturalxe2x80x9d relationship of a derived class with its second and subsequent base class subobjects. The problem and cost of multiple inheritance primarily comes from conversions between the derived and second or subsequent base class objects and xe2x80x9cthisxe2x80x9d pointer adjustments when a function member is called.
For virtual inheritance, current C++ implementations insert a pointer to each virtual base class within each derived class object. Access of the inherited virtual base class members is achieved indirectly though an associated pointer. With this implementation, space and access-time overhead is added when accessing data in a virtual base.
These reasons contribute to why virtual and multiple inheritance are avoided by class library designers if they can. They only pay the price if they use it.
Taligent""s C++ VFT of the most derived class contains offsets of all direct and indirect virtual bases so no extra indirection is needed to do virtual base access as the virtual inheritance chain lengthens. This is the same as in the IBM VisualAge C++, which has virtual base pointers of direct and indirect virtual bases embedded in the object such that no extra indirection is needed. However, both the IBM VisualAge C++ and Taligent C++ models do not have the ability to add new virtual bases without impacting RRBC.
The size of the VFTs in different current implementations, is as follows:
IBM VisualAge C++ VFT:
RTTI entries+virtual function slots
Taligent C++ RRBC VFT:
(number of direct and indirect virtual bases+number of classes that have virtual functions on leftmost path)+RTTI entries+virtual function slots
where RTTI refers to runtime type identification.
Unlike the Taligent C++, IBM""s VisualAge C++ VFT doesn""t contain any information for virtual base access so the size of IBM""s VFT is smaller. Virtual base access is done through virtual base pointers embedded in the object so the size of an IBM object will be bigger than objects in the Taligent model.
The overhead currently required, as noted above, in supporting the addition of new base classes is not within acceptable levels, and there is a need in the art for a solution.
In accordance with the method of invention, a class hierarchy is derived which maintains release-to-release binary compatibility. Leftmost classes of the class hierarchy are ordered in top down order with a most derived class at the bottom. Direct virtual classes are ordered from left to right with the leftmost class in declaration order at the top. Leftmost classes are independently grown downward and direct virtual classes are independently grown upward.
In accordance with the system of the invention, a virtual function table is provided for independently growing leftmost classes and direct virtual classes in a class hierarchy while maintaining release-to-release binary compatibility.
Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.