1. Field of the Invention
The present invention relates generally to database systems, and, more particularly, to improved methodology for versioning of databases.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See, e.g., Date, C., “An Introduction to Database Systems, Eighth Edition”, Addison Wesley, 2003. For a description of database transaction processing, see, e.g., Gray, J. et al., “Transaction Processing: Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems)”, Morgan Kaufmann, 1993.
In a database system, it is desirable to provide support for read-only transactions. If one only requires read access to a database for particular purposes, then one can conceivably create a system that achieves a transactionally consistent view of the database at a particular point in time, mainly when the transaction starts. This capability can be particularly valuable in a situation involving a lengthy duration read-only transaction, which is very common in the case of reporting, online back up, and the like. It can also be useful in a fail-over type situation as an efficient way to get a database back on line. Another example, for more sophisticated systems, is the case in which a replicate or minor copy of a database is being maintained. When a database is being mirrored, for scalability purposes it is desirable to be able to perform queries against one of these read-only minor copies. By doing so, this work can be off-loaded from the primary database to the mirror while also enabling data to be written. At the same time a query is being performed against this read-only mirror database, the read-only (mirror) database can also be updated with new records from the primary database without affecting the read-only transactions that are running against the minor database.
There are many types of database transactions that only require read access to the database. However, existing systems do not really take advantage of the fact that most transactions only involve reading from the database and consequently are unable to provide efficient concurrent access to the database. In order to provide greater scalability, it is desirable to be able to perform reads and writes concurrently. Write transactions should not be blocked from proceeding by read-only transactions that are being performed. At the same time, the write transactions should not block the read-only transactions.
In order to provide support for read-only transactions, one of the things that is needed is that the database system should not block for things like database locks, such as a table lock or row lock. However, in current database systems locks are almost always used for write transactions to achieve the goal of serializing the transaction, thereby serving as an obstacle to supporting read-only transactions in these systems. There are some current systems that have limited support for read-only transactions of this nature. Unfortunately, these current systems have a number of drawbacks and limitations. For example, at least one database system, the InterBase database system, uses a multi-generational architecture that avoids use of locks. However, the multi-generational approach has the disadvantage of the extra overhead required to maintain multiple database versions.
Generally, current systems supporting read-only transactions using one of two approaches. One of these approaches is the above multi-generation approach used in the InterBase database system. With this multi-generation approach, back versions of every change made to the database are actually recorded inside the database itself. This approach is not a log-based system as it does not use log files for versioning purposes. As a result, the database includes multiple versions of a given record and significant overhead is required to maintain a lot of duplicative information. Generally, the multi-generation approach causes the database to be populated with a lot of extra stores, back versions, and the like. This has an adverse impact on system performance.
Another issue with the multi-generation approach is that there is also a need for garbage collection in a manner similar to that needed in memory management. In the case of a multi-generation database system, there is the same need for garbage collection, however as it entails garbage collecting disk space, it can be quite expensive (computationally) and quite slow (e.g., compared to memory garbage collection). As a result, the approach can cause performance problems. For instance, a long duration read-only transaction may involve lots of write operations, thus generating lots of back versions, and so forth; these must be maintained and cannot be garbage collected until the read-only transaction commits. So in addition to the cost of garbage collection, there are also limitations on when it can be performed.
Another current approach is that illustrated by a current Oracle solution (available from Oracle of Redwood Shores, Calif.). The Oracle solution stores back versions in the database file itself. A disadvantage of this approach is that certain read-only transactions will sometimes fail because not all back versions are guaranteed to be maintained forever. Basically, the problem is that in some cases these back versions will overflow. When there are too many back versions, a certain number of back versions will be dropped. If a read-only transaction needs a particular back version and it has been dropped, then the read-only transaction will fail. Typically, the transaction will actually abort with an error message in the event of this type of failure. Given these shortcomings in current product offerings, a better solution is sought.
What is needed is a solution providing support for read-only transactions that does not have the foregoing disadvantages and limitations of prior systems. In particular, solution should maintain a read-only view of a database from a certain point in time even though other transactions are making changes to the same database. The solution should not require significant overhead and should not generate errors resulting from the dropping of back versions stored in the database. The present invention provides a solution for these and other needs.