The present invention relates to information databases. More particularly, the present invention relates to a method and apparatus for asynchronous version advancement in a three version database.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Databases often maintain multiple versions of data to avoid interference between read-only queries and update transactions. Many multi-versioning concurrency control protocols have been proposed, both for centralized and distributed databases. Examples of such proposals include: R. E. Steams and D. J. Rosenkrantz, xe2x80x9cDistributed Database Concurrency Controls Using Before-Values,xe2x80x9d ACM SIGMOD Conf. on the Management of Data, pp. 216-223 (1981) (referred to herein as xe2x80x9cStems and Rosenkrantzxe2x80x9d); R. Bayer, H. Heller and A. Reiser, xe2x80x9cParallelism and Recovery in Database Systems, ACM Trans. on Database Systems 5(2), pp. 139-156 (1980) (referred to herein as xe2x80x9cBayer, Heller and Reiserxe2x80x9d); A. Chan and R. Gray, xe2x80x9cImplementing Distributed Read-Only Transactions,xe2x80x9d IEEE Transactions on Software Engineering, 11(2), pp. 205-212 (1985) (referred to herein as xe2x80x9cChan and Grayxe2x80x9d); William E. Weihl, xe2x80x9cDistributed Version Management for Read-Only Actions, IEEE Transactions of Software Engineering Vol. SE-13, pp. 55-64 (1987) (referred to herein as xe2x80x9cWeihlxe2x80x9d); A. Chan, Fox, W-T. K. Lin, A. Nori and D. R. Ries, xe2x80x9cThe Implementation of an Integrated Concurrency Control and Recovery Scheme,xe2x80x9d ACM SIGMOD Conf. n the Management of Data, pp. 184-191(1982) (referred to herein as xe2x80x9cChan et al.xe2x80x9d); C. Mohan, H. Pirahesh and R. Lorte, xe2x80x9cEfficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-Only Transactions, ACM SIGMOD Conf. on the Management of Data, pp. 124-133 (1992) (refered to herein as xe2x80x9cMohan, Pirahesh and Lortexe2x80x9d); D. Agrawal and S. Sengupta, xe2x80x9cModular Synchronization in Multiversion Databases: Version Control and Concurrency Control,xe2x80x9d ACM SIGMOD Conf. on the Management of Data, pp. 408-417 (1989); and P. Bober and M. Carey, xe2x80x9cOn mixing Queries and Transactions via Multiversioning Locking, xe2x80x9cProc. 8th IEEE Intl. Conf. on Data Engineering, pp. 535-545 (1992) (refered to herein as xe2x80x9cBober and Careyxe2x80x9d).
Of special interest are protocols that satisfy the following three properties (collectively referred to herein as the xe2x80x9cnon-interference requirementxe2x80x9d): (1) read-only queries are completely decoupled from updates; (2) a transaction is never delayed by version management; and (3) a read-only query does not need to acquire any lock, or write any control information, to an accessed data item (such writes basically turn the read-only queries into much heavier-weight updates). At the same time, it may not be critical that queries read the latest committed data, although read-only queries should not fall too far behind the updates.
This non-interference requirement eliminates from consideration many protocols, like the two-version protocols of Bayer, Heller and Reiser and Stems and Rosenkrantz, where interference between read and update transactions is still possible, and the protocol of Weihl, which requires read transactions to write read time-stamps to accessed data items.
Existing techniques that satisfy the non-interference requirement fall into two categories. Techniques belonging to the first categoryxe2x80x94such as Chan et al. and Bober and Carey (for centralized schemes) and Chan and Gray (for a distributed scheme)xe2x80x94may create a potentially unlimited number of data versions in the case of a long-running read transaction. Thus, read transactions incur the overhead of following a potentially unlimited chain of data version pointers to determine which version should be used. Moreover, in the distributed case of Chan and Gray, the set of sites that will be visited must be known in advance.
Techniques in the second categoryxe2x80x94such as Mohan, Pirahesh and Lorte; K. L. Wu, Philip S. Yu and M. S. Chen, xe2x80x9cDynamic, Finite Versioning for Concurrent Transaction and Query Processing,xe2x80x9d Technical Report RC 16633, IBM T. J. Watson Research Center (1991) (referred to herein as xe2x80x9cWu, Yu and Chenxe2x80x9d); Merchant, Performance Analysis of Dynamic Finite Version Scheme: Storage Cost vs. Obsolescence,xe2x80x9d IEEE Transactions on Knowledge and Data, vol. 8 no. 6 (1996); and K. L. Wu, Philip S. Yu and M. S. Chen, xe2x80x9cDynamic Finite Versioning: An Effective Versioning Approach to Concurrent Transaction and Query Processing,xe2x80x9d Int. Conf. on Data Engineering (1993)xe2x80x94introduce a separate version advancement procedure to move from one version to another, and require four versions to satisfy the non-interference requirement in a centralized database. However, the extension to a distributed database discussed in Mohan, Pirahesh and Lorte requires version advancement to be coordinated with user operations: otherwise, autonomous version advancement may cause user operations to be aborted. Thus, the non-interference requirement is violated in the distributed case.
In view of the foregoing, it can be appreciated that a substantial need exists for an asynchronous version advancement protocol in a distributed three version database that satisfies the non-interference requirement and solves the problems discussed above.
The disadvantages of the art are alleviated to a great extent by a method and apparatus for asynchronous version advancement in a three version database. For a distributed database, read transactions are executed using a first version of a database. Update transactions are executed such that information is written into a second version of the database. The second version may include less than all of the information contained in the first version. A database version begins to be advanced at each node such that the information in the second version becomes available for read transactions. For an update transaction that starts on a node after the database version has been advanced, the update transaction is executed such that the update transaction writes information into a third version of the database. The advancement of the database version is completed such that the second version becomes the first version and the third version becomes the second version. For a centralized database, the protocol reduces of the number of versions from four to three.
With these and other advantages and features of the invention that will become hereinafter apparent, the nature of the invention may be more clearly understood by reference to the following detailed description of the invention, the appended claims and to the several drawings attached herein.