1. Field of the Invention
The present invention generally relates to a concurrent transaction and query processing system and, more particularly, to a dynamic, finite versioning scheme in which there is no interference between transactions and queries and no quiescence of either transactions or queries for allowing queries to access a more up-to-date database. The new mechanism uses time-invariant and time-varying data structures to define query snapshots, to facilitate a new query snapshot to be taken without interrupting either the transaction or query processing, to identify dynamically appropriate versions for transaction and query accesses, and to allow efficient, on-the-fly garbage collection when it is recognized that only a single page copy is sufficient to represent the required logical versions.
2. Description of the Prior Art
In concurrent transaction and query processing environments, only transactions may potentially update the database while queries are read-only actions. Largely supporting decision making, queries do not necessarily have to access the most up-to-date database, as long as they access a transaction-consistent database. By maintaining multiple versions of data objects, the interference between transactions and queries can be eliminated: transactions create new versions and queries access old versions. See, for example, David P. Reed, "Implementing Atomic Actions on Decentralized Data", ACM Trans. on Computer Systems, vol. 1, no. 1, pp. 3-23, February 1983; A. Chan, S. Fox, W. -T. K. Lin, A. Nori, and D. R. Ries, "The Implementation of An Integrated Concurrency Control and Recovery Scheme", ACM SIGMOD Proc. Int. Conf. on Management of Data, pp. 184-191, 1982; A. Chan and R. Gray, "Implementing Distributed Read-Only Transactions", IEEE Trans. on Software Engineering, vol. SE-11, no. 2, pp. 205-212, February 1985; W. E. Weihl, "Distributed Version Management for Read-Only Actions", IEEE Trans. on Software Engineering, vol. SE-13, no. 1, pp. 55-64, January 1987; and P. A. Bernstein, V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987.
In existing approaches using multiversioning as described in the Bernstein et al. text cited above, every transaction creates a new version of a data object and the old versions of the data object are kept for potential query accesses. At any given instant, there might be a large, unlimited number of versions maintained for a data object. As a result, although the interference is eliminated, the problems of storage overhead (for maintaining a potentially unlimited number of old versions) and version-management complexity (for version retrieval and garbage collection) can be severe.
Various multiversioning schemes have been proposed to achieve a higher level of concurrency. In addition to the articles by Reed, Chan et al. and Weihl et al. cited above, see also R. Bayer, H. Heller and A. Reiser, "Parallelism and Recovery in Database Systems", ACM Trans. on Database Systems, vol. 5, no. 2, pp. 139-156, June 1980, and R. E. Sterns and D. J. Rosenkrantz, "Distributed Database Concurrency Control Using Before-Values", Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 74--83, 1981. Bayer et al. and Stearns et al. have proposed multiversion concurrency protocols using two versions. Interference between read-only queries and update transactions is reduced, but not eliminated. The increase in the level of concurrency is limited, for only a single old version is maintained and read-only queries may still compete with update transactions through special locking protocols.
Interference between update transactions and read-only queries can be eliminated by maintaining a possibly unlimited number of versions for a data object. Reed, cited above, has proposed a scheme which works conceptually by keeping forever every version created. Garbage collection was not well addressed and read-only queries may have to be aborted if certain old versions that they need are no longer available. Chan et al., cited above, have developed a version management technique using a ring buffer as the version pool, storing old versions. When the ring buffer overflows, some old versions have to be discarded to make room for versions to be updated by transactions, causing queries to be aborted. The possibility of aborting a read-only query due to an early garbage collection can be eliminated by a scheme developed by Weihl, also cited above. However, it is achieved at the cost of imposing a complex, expensive initiation phase for query execution in which, before it can start accessing any database object, a query has to ensure that all the versions that it needs are available and registered to prevent interference from early garbage collection.