Many computer programs model problems as sets of inter-related objects. During execution, such programs perform operations on objects that are stored as data structures in the memory of the computer system. Objects may have numerous attributes, including attributes that represent relationships to other objects. When a first object has an attribute that represents a relationship to a second object, the first object is referred to as the source object and the second object is referred to as the target object.
The information used to represent an attribute that represents a relationship with a target object is referred to as a “reference”. A reference to a target object is stored in the data structure that represents the source object. For many applications, the most common operation in processing objects is to locate the target object based on the reference to the target object contained in the source object. This operation is known as reference traversal.
Many programs use a data type called a “pointer” to reference a target object. A pointer indicates the memory location of the data structure that corresponds to the target object. To allow a program to access more objects than can fit in the available dynamic memory, pointers typically contain a virtual memory address (VMA) of the target object rather than the actual physical address of the dynamic memory location at which the target object resides. When a program uses the VMA as the representation of the reference, the program relies on the underlying virtual memory mechanism of the computer operating system and hardware for looking up the physical memory addresses of target objects.
When an object is in dynamic memory, the VMA of the object is generally used to represent a reference to the object because most computers have a built-in hardware lookup mechanism and high-speed memory for an address translation table that maps VMAs to physical memory addresses. In addition, operating systems typically provide efficient memory caching support based on virtual memory access. By employing the built-in address translation and memory caching mechanisms, the use of VMAs as references to target objects results in a highly efficient reference traversal.
Unfortunately, VMAs and physical memory addresses are dynamically allocated, and are thus only valid, within and during a particular program execution. Therefore, a VMA used for a target object during one execution of a program may not be the same for the same target object during a subsequent execution of the same program. Similarly, the VMA of any given target object may not be the same for two concurrently executing programs. Therefore, VMAs cannot be used as the sole representation of references to target objects in environments where objects are shared across different executions of the same program, or across different concurrently-executing programs.
In a database management system (DBMS), the information used for references must correctly identify objects across all program executions. Some unique way of permanently identifying the object must be provided, which shall be referred to herein as an “object identifier” (OID). OIDs must be based on the lowest-level physical storage address. Thus, an OID typically communicates where an object is located on disk, as opposed to the object's VMA or physical memory address. Consequently, OIDs are typically based on some form of data block ID. OIDs may, for example, include a multi-part key that indicates a file number (identifying the file where the information is stored on disk) or a proxy thereof, and a relative block number (which counts the number of fixed-size blocks into that file where the data is stored). (Other usages for OID in the literature sometimes describe a LOGICAL persistent unique ID, but this must eventually be mapped to a physical storage address via an index or similar structure).
Unlike VMAs, the OID of an object uniquely identifies the object across all program executions. However, if the OID is used as a reference to an object stored in dynamic memory, each traversal operation requires mapping the OID to the VMA of the target object, and then mapping the VMA to the physical memory address of the target object. The process of mapping an OID to a VMA consumes significantly more processing time than mapping a VMA to a physical address.
Various attempts have been made to achieve the efficiency of reference traversals with VMAs while still being able to share objects between multiple programs and multiple executions of the same program. One common approach uses OIDs as references to objects that are not currently loaded into dynamic memory, and VMAs as references to objects that have been loaded into dynamic memory. Therefore, references to an object must be converted from one form to another when the object is transferred between static memory and dynamic memory. The process of converting references between an external form and an internal form is referred to as reference swizzling.
According to one reference swizzling technique, when an object is loaded from disk into main memory, all of the references contained within the object are converted into VMAs. Since the target objects of those references may not be in main memory, VMAs must be pre-allocated for the target objects as if they were already in main memory.
When a reference to a target object that is not in main memory is traversed, the DBMS loads the target object into main memory. To detect such reference traversal operations, the DBMS may rely on the computer operating system by setting all pre-allocated VMAs in access-protected mode. When an access-protected VMA is accessed, the computer operating system detects a memory access protection violation and raises an exception. The DBMS handles the exception by loading the desired target object into main memory and changing the VMA to a mode that allows access.
While the use of protected mode allows for fast reference swizzling, it relies on special operating system supports, such as the memory access control, detection, and exception handling functions of the operating system. Unfortunately, these supports may deviate from platform to platform, and may even be unavailable in some platforms. Therefore, this approach is not practical for DBMS systems that are intended for use on multiple platforms. Further, because memory has been pre-allocated for all of the objects, the memory cannot be reused for other purposes. Therefore, applications that use a large number of objects may run out of memory.
According to an alternative approach, each reference is a data structure that contains a discriminant field and a variant field. The value in the discriminant field indicates whether the variant is an object identifier or the VMA of the target object. Each object in main memory has a “surrogate” that is a data structure containing a reference count, the object identifier of the target object, and the VMA of the target object. When the DBMS loads an object from disk into main memory, the value of the discriminant of each reference contained in the object is initially set to indicate that the corresponding variant is the object identifier of the target object.
When an application traverses the reference, the DBMS determines whether the discriminant of the reference indicates that the variant is an object identifier or a VMA. If the variant is a VMA, then the VMA is used to locate the surrogate. The VMA stored in the surrogate is then used to locate the target object.
If the variant is an object identifier, then the DBMS looks up the VMA of the surrogate. If the surrogate exists, then the variant of the reference is set to the VMA of the surrogate. The discriminant of the reference is set to indicate that the variant is the VMA of the surrogate. The reference count of the surrogate is then incremented by one.
If the surrogate does not exist, then the target object is loaded from disk into main memory, a surrogate is allocated for the target object, the object identifier and the VMA in the surrogate are set to those of the target object and the reference count of the surrogate is set to zero. Then, the DBMS performs the steps of setting the variant, setting the discriminant, and incrementing the reference count, as described above.
When an object is saved to disk, the DBMS decrements the reference count of all of the surrogates pointed to by the references in the object. Therefore, at any given time, the reference count of a surrogate indicates how many references are currently pointing to the surrogate. Only when the reference count of a surrogate is zero may the object pointed to by the surrogate be swapped to disk and the surrogate deallocated.
One disadvantage of the surrogate technique is that the DBMS cannot swap an object to disk to free up memory as long as an object that has a traversed reference to the object remains in memory. Consequently, the DBMS may become significantly limited with respect to how it may free up memory to load newly referenced objects. In addition, the process of decrementing surrogates adds overhead to the process of storing objects to disk.
U.S. Pat. No. 5,887,275 describes a technique for swizzling references that attempts to address these disadvantages. According to that technique, if a reference to an object has previously been used to locate a first object, then a data structure referred to as a “tombstone” that has been associated with the first object is located based on a first VMA that is stored in the reference.
Once the tombstone has been located, a first pseudo-timestamp that is stored in the reference is compared to a second pseudo-timestamp that is stored in the tombstone. If the first pseudo-timestamp matches the second pseudo-timestamp, then the first object is located based on a second VMA that is stored in the tombstone.
If the first pseudo-timestamp does not match the second pseudo-timestamp, then the first object is located based on an identifier stored in the reference. Similarly, if the reference has not been previously used to locate the first object, then the first object is located based on the identifier stored in the reference.
Unfortunately, even when such techniques are used to defer or otherwise reduce the overhead associated with converting OIDs to VMAs, such conversion operations often take a significant amount of CPU time, consume extra storage space, and/or add additional latency to separate the pointer data from the real data when swizzling.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.