Many computer programs model problems as sets of inter-related objects. During execution, such programs perform operations on objects that are stored as data structures in the memory of the computer system. Objects may have numerous attributes, including attributes that represent relationships to other objects. When a first object has an attribute that represents a relationship to a second object, the first object is referred to as the source object and the second object is referred to as the target object.
The information used to represent an attribute that represents a relationship with a target object is referred to as a "reference". A reference to a target object is stored in the data structure that represents the source object. For many applications, the most common operation in processing objects is to locate the target object based on the reference to the target object contained in the source object. This operation is known as reference traversal.
Many programs use a data type called a "pointer" as a reference to a target object. A pointer indicates the memory location of the data structure that corresponds to the target object. To allow a program to access more objects than can fit in the available dynamic memory, pointers typically indicate the virtual memory address of the target object rather than the actual physical address of the dynamic memory location at which the target object resides. When a program uses the virtual memory address as the representation of the reference, the program relies on the underlying virtual memory mechanism of the computer operating system and hardware for looking up the physical memory addresses of target objects.
When an object is in dynamic memory, the virtual memory address of the object is the optimal representation for a reference to the object because most computers have a built-in hardware lookup mechanism and high-speed memory for an address translation table that maps virtual memory addresses to physical memory addresses. In addition, operating systems typically provide very efficient memory caching support based on virtual memory access. By employing the built-in address translation and memory caching mechanisms, the use of virtual memory addresses as references to target objects results in a highly efficient reference traversal.
Unfortunately, virtual memory addresses are dynamically allocated and are thus only valid within and during a particular program execution. Therefore, a virtual memory address used for a target object during one execution of a program may not be the same for the same target object during a subsequent execution of the same program. Similarly, the virtual memory address of any given target object may not be the same for two concurrently executing programs. Therefore, virtual memory addresses cannot be used as the sole representation of references to target objects in environments where objects are shared across different executions of the same program, or across different concurrently-executing programs.
In a database management system (DBMS), the information used for references must correctly identify objects across all program executions. Therefore, a unique object identifier (OID) is assigned to each object. Unlike virtual memory addresses, the OID of an object uniquely identifies the object across all program executions. However, if the OID is used as a reference to an object stored in dynamic memory, each traversal operation requires mapping the OID to the virtual memory address of the target object, and then mapping the virtual memory address to the physical memory address of the target object. The process of mapping an OID to a virtual memory address consumes significantly more processing time than mapping a virtual memory address to a physical address.
Various attempts have been made to achieve the efficiency of reference traversals with virtual memory addresses while still being able to share objects between multiple programs and multiple executions of the same program. Typically, these approaches use OIDs as references to objects that are not currently loaded into dynamic memory, and virtual memory addresses as references to objects that have been loaded into dynamic memory. Therefore, references to an object must be converted from one form to another when the object is transferred between static memory and dynamic memory. The process of converting references between an external form and an internal form is referred to as reference swizzling.
According to one prior art reference swizzling technique, when an object is loaded from disk into main memory, all of the references contained within the object are converted into virtual memory addresses. Since the target objects of those references may not be in main memory, virtual memory addresses must be pre-allocated for the target objects as if they were already in main memory.
When a reference to a target object that is not in main memory is traversed, the DBMS loads the target object into main memory. To detect such reference traversal operations, the DBMS relies on the computer operating system by setting all pre-allocated virtual memory addresses in access-protected mode. When an access-protected virtual memory address is accessed, the computer operating system detects a memory access protection violation and raises an exception. The DBMS handles the exception by loading the desired target object into main memory and changing the virtual memory address to a mode that allows access.
While the use of protected mode allows for fast reference swizzling, it relies on special operating system supports, such as the memory access control, detection, and exception handling functions of the operating system. Unfortunately, these supports may deviate from platform to platform, and may even be unavailable in some platforms. Therefore, this approach is not practical for DBMS systems that are intended for use on multiple platforms. Further, because memory has been pre-allocated for all of the objects, the memory cannot be reused for other purposes. Therefore, applications that use a large number of objects may run out of memory.
According to an alternative approach, each reference is a data structure that contains a discriminant field and a variant field. The value in the discriminant field indicates whether the variant is an object identifier or the virtual memory address of the target object. Each object in main memory has a "surrogate" that is a data structure containing a reference count, the object identifier of the target object, and the virtual memory address of the target object. When the DBMS loads an object from disk into main memory, the value of the discriminant of each reference contained in the object is initially set to indicate that the corresponding variant is the object identifier of the target object.
When an application traverses the reference, the DBMS determines whether the discriminant of the reference indicates that the variant is an object identifier or a virtual memory address. If the variant is a virtual memory address, then the virtual memory address is used to locate the surrogate. The virtual memory address stored in the surrogate is then used to locate the target object.
If the variant is an object identifier, then the DBMS looks up the virtual memory address of the surrogate. If the surrogate exists, then the variant of the reference is set to the virtual memory address of the surrogate. The discriminant of the reference is set to indicate that the variant is the virtual memory address of the surrogate. The reference count of the surrogate is then incremented by one.
If the surrogate does not exist, then the target object is loaded from disk into main memory, a surrogate is allocated for the target object, the object identifier and the virtual memory address in the surrogate are set to those of the target object and the reference count of the surrogate is set to zero. Then, the DBMS performs the steps of setting the variant, setting the discriminant, and incrementing the reference count, as described above.
When an object is saved to disk, the DBMS decrements the reference count of all of the surrogates pointed to by the references in the object. Therefore, at any given time, the reference count of a surrogate indicates how many references are currently pointing to the surrogate. Only when the reference count of a surrogate is zero may the object pointed to by the surrogate be swapped to disk and the surrogate deallocated.
One disadvantage of the surrogate technique is that the DBMS cannot swap an object to disk to free up memory as long as an object that has a traversed reference to the object remains in memory. Consequently, the DBMS may become significantly limited with respect to how it may free up memory to load newly referenced objects. In addition, the process of decrementing surrogates adds overhead to the process of storing objects to disk.
Based on the foregoing, it is clearly desirable to provide a mechanism for and method for swizzling references to objects that increases the reference traversal speed over using OIDs as the representation for references. It is further desirable to provide a mechanism for swizzling references that does not rely on support from specific platforms.