The present invention relates generally to reorganization in object oriented databases. More specifically the present invention relates to a method for updating physical references to objects without interfering with executing applications.
In an object oriented database data records, referred to as objects include reference pointers to other objects. In some systems these pointers are physical references, while other systems use logical references. A physical reference is an actual location, or address on a storage medium, where the referred to objects can be found. Logical references are object identifiers, unique for that object and independent of the physical location of the record.
By analogy, the address of a residence is a physical reference to an individual residing there. As long as the individual remains at that address, anyone visiting that address will find him. However, the same individual may move and no longer be associated with that address, in which case the address will exist, but the individual will not be found. Indeed, another individual may reside there.
In contrast, a social security number is a logical reference to an individual. No one else will receive the same number and the individual number, under ordinary circumstances will not change. However, the social security number alone does not indicate where to physically find the individual.
Since objects in an object oriented database are interrelated through references, it is necessary to traverse one or more and often many, intermediate objects, before reaching a particular object. If any of the intermediate objects cannot be reached then all subsequent objects will be unreachable. Thus it is imperative to keep track of all object migration, i.e., all physical relocations of an object.
The process of relocating, or migrating objects and updating the references to them, whether the references are physical or logical, is referred to as reorganization. Reorganization of objects in an object-oriented database is well known in the art and is an important component of several utilities like compaction, clustering, partitioning and schema evolution.
Compaction reduces fragmentation of various length objects, resulting from continuous allocation and de-allocation of space for these objects, by migrating objects to different locations and packing them closely. Clustering involves locating related objects within the same disk block or adjacent blocks. As a result the performance of transactions which access those sets of objects within a small time frame, is improved. In contrast to clustering, partitioning separates objects across several disks to enable concurrently accessed objects to be retrieved in parallel. Clustering and partitioning of objects is determined based on changes in workload and updates to objects. Schema evolution often requires objects to be moved since an object may not fit in its current location due to a size change caused by schema evolution.
When physical references are used each reference to an object, which may be many, must be updated whenever the object is physically relocated. For example, one prior art method of updating an object-oriented database with physical references maintains back pointers from every object in the database. In this manner, before an object is relocated, the back pointer enables quick and easy updating of the reference to the migrating object. However, maintaining back pointers greatly increases storage overheads and causes lock contention in back pointer lists of xe2x80x9cpopularxe2x80x9d objects, which are pointed to from many parent objects. Thus, maintaining back pointers is unacceptable in many applications.
Logical references on the other hand, do not require updating since the logical reference is independent of the physical location of the object. However, the data record itself is not found merely from the logical reference. Rather a mapping procedure is executed to return the physical location of an object. It is only the one cross reference in the map that requires updating when an object is moved from one physical location to another.
Clearly, updating one cross reference in a map is more desirable than determining every physical reference to a relocated object. Indeed, where the cost of reorganization is of concern, one solution found in the prior art is to use logical references. However, while logical references require only one update for each relocated object, they entail one extra level of indirection for every access of the object, i.e. mapping. In a memory resident database, this increases the access path length to an object by a factor of two, and may also considerably increase main-memory requirements. These overheads are unacceptable in a number of scenarios such as call setup in telecommunications, which require response times to be in the order of hundreds of microseconds.
Even where the cost of physical references is acceptable over logical references, another issue must be addressed: that of concurrent transactions. The concern is that between the actual migration of an object and updating all physical references, an application that has been running may have retrieved the old reference. In other words, assume an object O points to location X for object Oxe2x80x2 and Oxe2x80x2 is relocated to location Y. While the pointer from 0 will be updated to point to Y, the concern is that immediately prior to updating the pointer at O, an application will have already retrieved the old pointer, stored it in its local memory and will follow the pointer to X, looking for Oxe2x80x2.
One approach to this problem is to access the local memory of the application and actually change the physical reference whenever the referred to object has moved to another location. This requires an action-quiescent state during which all objects in the memory of active transactions and persistent roots are copied into a new space. This method, however, works only when the database manager has low level support from the hardware and operating systems including access to the local memory of the application. For example, the system must be able to change references in the registers and stacks of active transactions and trap certain pointer references using memory protection. Often times this is not available. For example, windows based systems do not allow access to this low level memory. In addition, those techniques use forwarding addresses which may require an extra I/O and require use of a complicated failure recovery technique to ensure consistency of the disk version of the database. See E. K. Kolodner and W. E. Weihl, xe2x80x9cAtomic Incremental Garbage Collection and Recovery of a Large Stable Heap,xe2x80x9d In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 177-186, (Washington, D.C., May 1993), hereby incorporated by reference as if fully set forth herein.
Another approach is to find the set of parent objects with a reference to a migrating object O, lock them so they are unavailable to the application and update the reference in each parent object. The transaction locks the object, either with a read lock, thereby preventing any other application from reading the object, or with a write lock, allowing reading, but preventing any deletion or insertion of a reference contained within the object. Once a transaction has locked an object (in the appropriate mode), it can i) copy into its local memory any reference from that object to another; ii) delete a reference from that reference to another; and iii) copy a reference stored in its local memory into that object, so that the object now points to another object. In all of the above, the transaction is not required to hold a lock on the object to which the reference points.
Traditionally, the application effecting the migration of an object will lock out all transactions from the entire database. This is referred to as off-line reorganization. Finding the parents of O with off-line reorganization is accomplished by traversing an object graph of the database. An object graph is a well known system model of an object-oriented database and has been used to analyze issues different from those addressed herein. See e.g., L. Amsaleg, M. Franklin, and 0. Gruber, xe2x80x9cEfficient Incremental Garbage Collection for Client-Server Object Database Systems,xe2x80x9d In Proceedings of the 21st VLDB Conference (September 1995); S. Ashwin, et al., xe2x80x9cGarbage Collection in Object Oriented Databases Using Transactional Cyclic Reference Counting,xe2x80x9d In Proceedings of the 23rd VLDB Database Conference (Athens, Greece, August, 1997), hereby incorporated by reference as if fully set forth herein.
Referring to FIG. 8, a model of an object graph is shown where the objects in the database form a directed graph. The nodes of the graph are the objects in the database, and an edge, for example from A to B as shown (generically from R to O) exists in the graph if and only if object R contains a physical reference to object O. The term xe2x80x9creferencexe2x80x9d is used herein to mean both the object identifier of another object, as well as the edges in the object graph, i.e., a reference from some object R to an object O. It will be clear from the context which usage is intended. Objects R which contain a reference to an object O are known as the parents of O and O is said to be a child of R.
Traversing the object graph of the database begins with a persistent root, which is a special object found in each database and may be node A in FIG. 8. All objects in the object graph that are reachable either from a persistent root, or from an object whose reference is in the local memory of an active transaction are live objects. All other objects in the database are said to be xe2x80x9cgarbagexe2x80x9d since without a reference to them they cannot be reached. In FIG. 8 there are no garbage objects. B is reachable from A. C is reachable from B. D is reachable from B and C.
Thus with off-line reorganization there is no concern that an application retrieved the old reference and will no longer find the correct object, since no application is allowed to run during migration of an object and updating of the references to it. In today""s world, however, off-line reorganization is becoming less of an option since information systems are requiring twenty four hours a day, seven days a week operation. Going off-line may present an intolerable situation. This is especially true of with global corporations spanning multiple time zones where there is no appropriate low activity time during which reorganization can be performed. Since off-line reorganization has been the only alternative available, conventional wisdom states object migration can be very disruptive to normal processing if physical references are used.
Consequently, there is a need in the field of object oriented databases for a method of online reorganization, allowing as many concurrent transactions as is possible. Moreover, it is desirable that such a system be less expensive than heretofore known and advantageously allow for the use of physical references, thus eliminating the look up costs associated with logical references. In addition, it would be desirable to have such a reorganization method where interference with low level applications is minimized or eliminated altogether. Such a method would advantageously allow applications to continue their operations during any physical relocation of objects.
Using a new Incremental Reorganization Algorithm (xe2x80x9cIRAxe2x80x9d) in an object-oriented database with physical references, on-line reorganization with minimal interference to concurrently executing transactions is performed. Initially, the database is traversed with a single fuzzy traversal beginning with the persistent root of the database, to determine all approximate parents of each object being migrated. During the fuzzy traversal no objects are locked. Rather, a short latch is temporarily obtained as the references in an object are read. Thereafter, individually for each object, a set of exact parents is found and locked, their references to the migrating object are updated and the migrating object is moved. Accordingly, relatively few locks are held at any time, thus minimizing interference with concurrent transactions.
In a second embodiment of the present invention and with particular application for large databases, such as those in the order of gigabytes, where it is expensive to traverse the entire database in order to carry out reorganization, the database may be partitioned and the IRA of the present invention may be executed on one partition at a time. In this embodiment, an External Reference Table (xe2x80x9cERTxe2x80x9d) is maintained for each partition, listing the objects with at least one parent from another partition. The fuzzy traversal of a partition begins from each of the objects listed in that partition""s ERT.
In a further embodiment of the present invention, with particular application in high performance situations, concurrency may be further improved by locking a migrating object one at a time and then individually locking, updating and releasing each exact parent until all exact parents have been updated. Thereafter, the migrating object is relocated and released. Thus, at most two locks are held at any point in time by this extension.
In all of the aforementioned embodiments a Temporary Reference Table (xe2x80x9cTRTxe2x80x9d) may be employed for listing separately for each migrating object, a new reference or a deleted reference to that migrating object. This is important since no locks are obtained during the fuzzy traversal. All new references are updated as with the approximate parents. As for deleted references, approximate parents for which the TRT indicates the reference was deleted, need not be updated.