1. Field of the Invention
This invention generally relates to database systems and more particularly to recovery of a database after a system failure.
2. Description of the Prior Art
Databases have played a critical role in business areas such as banking and airline reservations for quite some time. These two lines of business illustrate how critical availability of a database can be. If a bank cannot process customer transactions (deposits, payments, transfers, etc.) because of a computer system failure, a customer may be unable to accomplish its business. Likewise, if an airline is unable to book reservations, customers may be lost and its planes may fly empty. Therefore, database availability is critical to some businesses.
While great strides have been made toward providing fault tolerant computer systems, some components are still subject to failure, and therefore an efficient database recovery strategy must be in place to minimize the time that transactions cannot be processed against the database. As illustrated with the airline reservation system, if the database recovery process takes too long, the delay may result in passengers taking their business to another airline. Thus, the recovery strategy must be efficient to minimize the time that the database is unavailable.
A database must be brought to a consistent state after a computer system failure in order to correctly recover a database. That is, any updates to the database which were caused by transactions which were in process but had not completed at the time of the system failure, must be undone. And any database updates caused by transactions that had completed prior to the computer system failure must remain in the database. Transaction processing and recovery theory is more fully discussed in "An Introduction to Data Base Systems", Vol. 1, 4th Edition, by C. J. Date and published by Addison-Wesley Publishing Co. at Chapter 18.
As described in U.S. Pat. No. 4,819,156 entitled "Database Index Journaling for Enhanced Recovery," to DeLorme, et al. (hereinafter referenced as "DeLorme"), a database may be comprised of database records and a database index. The database records contain the data the user inputs to the database and the index contains the data structures used to reference selected data in the database. Well known database index methods include hashing and use of B.sup.+ -trees.
Both the database records and the database index must be made consistent to correctly recover a database. DeLorme further points out that at the time of computer system failure, a transaction may have caused the index to be updated but the record may not have been updated with the most recent data. In addition, if the database system includes cache processing for caching in main memory the database index and database records, the affected portions of the database index or the updated database record may not have been written to non-volatile storage (such as a magnetic disk) at the time of the computer system failure.
As described in great detail in U.S. Pat. No. 5,043,866, entitled "Soft Checkpointing System Using Log Sequence Numbers Derived from Stored Data Pages and Log Records for Database Recovery" to Myre etal. (hereinafter referenced "Myre"), the efficiency of database recovery largely depends on the technique employed to log to a file the status of transactions processed and corresponding database updates. Throughout the remainder of this specification, the process of logging status information and database updates will be referred to as "auditing" and the file to which information is logged is the "Audit File." The process of auditing in turn affects the rate of transaction processing. Auditing adds to transaction processing overhead and therefore reduces transaction processing throughput, but is necessary for recovering from system malfunctions.
Myre teaches the method of "soft" checkpointing to accomplish auditing with minimal overhead while minimizing the portion of the audit file that must be processed in order to recover a database. Periodically, the MINBUFLSN and LOWTRANLSN values are stored in the log (herein "audit") file. MINBUFLSN is the record number or "Logical Sequence Number" of the record in the audit file that contains the update to the page in cache that was updated prior to any other "dirty" page in cache. Thus, MINBUFLSN identifies the point in the audit file after which operations in the audit file may need to be redone. Any updates to cache pages audited in records preceding MINBUFLSN in the audit file have already been destaged to disk. LOWTRANSLSN is the record number of the record in the audit file that contains the first audit record relating to the oldest uncommitted transaction. Both MINBUFLSN and LOWTRANSLSN are stored in the header of the audit file. At recovery time, the minimum of MINBUFLSN and LOWTRANSLSN is used as the point in the audit file at which recovery should begin. Because MINBUFLSN and LOWTRANSLSN are only stored periodically, the starting point for recovery as identified by the minimum of MINBUFLSN and LOWTRANSLSN is only an approximation of the actual optimal point in the audit file at which recovery should begin. The reason that Myre only provides an approximation is because between the time that the checkpoint of LOWTRANSLSN and MINBUFLSN is done and the computer system fails, additional transactions may have committed, thereby making the value of LOWTRANSLSN in the header of the audit file less than the optimal value leading to processing a greater portion of the audit file for recovery and increased recovery time. The method of Myre may also add unnecessary overhead in processing transactions because the checkpoint operation is performed without regard to whether LOWTRANSLSN or MINBUFLSN had actually changed.
It would therefore be desirable to have a method that minimizes the processing overhead involved in auditing database updates and that accomplishes database recovery in a timely manner by only processing the portion of the audit file that is necessary.