1. Field of the Invention
This invention relates to a system for filtering recovery log archives to reduce the archive size while retaining all log records essential to transaction-consistent forward recovery solely from offline image dump and recovery log archive media.
2. Description of the Related Art
Recovery from secondary stable storage media failures is an important problem to computer systems. In some cases, media recovery must be done entirely from offline storage media (such as a tape archive). This is particularly necessary in small systems with a single disk stable store but may also be required to recover from larger system disasters such as machine room fires or natural disasters. Transaction-based systems such as database management systems require recovery of a data resource stored on stable media to an atomic or transaction-consistent state. Conventional data resource recovery algorithms are intended for crash recovery and assume that the entire recovery log is available, including that portion of the log conventionally stored on-line in stable storage media. Loss of the on-line recovery log through stable media failure will prevent transaction-consistent crash recovery. Forward recovery using conventional recovery algorithms requires the recovery log archive tape to include all recovery log records. This requirement for storing all log records is troublesome because of the substantial storage volume occupied by such a recovery log archive.
A conventional recovery system known in the art is described by C. Mohan, et al, "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging," IBM Research Report RJ 6649, revised Nov. 2, 1990, IBM Research Division, Yorktown Heights, N.Y., which document is incorporated herein in its entirety by this reference.
As pointed out by C. J. Date, "An Introduction To DataBase Systems", Volume 1, 4th Edition, Addison-Wesley Publishing Co., Copyright 1986, Ch. 18, a "transaction" is a logical unit of work referencing a sequence of operations that transforms a consistent state of a recoverable resource into another consistent state without necessarily preserving consistency at all intermediate points in the sequence. For purposes of this discussion, a database is referenced as a typical instance of a recoverable resource.
Database management systems that maintain data in stable storage are subject to failures that leave the data in a corrupted or inconsistent state. Inconsistent data can violate integrity guarantees assumed by users and extreme cases can cause a database management system to operate improperly or crash. The database can be recovered from crashes by scanning the recovery log to determine which transactions have committed and which have aborted, using UNDO recovery log records to back out the actions of aborted transactions and using REDO recovery log records to repeat the actions of committed transactions that may not have been written to permanent storage before the crash. To recover from stable storage media failure, the archival dump of database and recovery log must be similarly processed.
A system supporting transaction processing guarantees that if a transaction executes some updates against the database and a system failure occurs before the transaction reaches its normal termination, then those updates will be undone as part of a recovery procedure. Consequently, the transaction either executes in its entirety or it is totally cancelled. Guaranteeing the atomicity and durability of transactions in the face of concurrent execution of multiple transactions and unpredictable failures is a very important problem in transaction processing. Many methods have been developed in the past to deal with this problem, but the related assumptions, performance, characteristics and complexity associated with such methods have not always been acceptable.
To meet transaction and data recovery guarantees, the transaction recovery system records in a recovery log the progress of a transaction and its actions that cause changes to recoverable data objects. The recovery log becomes the source for ensuring that either the transaction's committed actions are reflected in the database or its uncommitted (aborted) actions are undone despite various types of failures. When the logged actions reflect data object content, then those recovery log records also become the source for reconstruction of damaged or lost data. Conceptually, the recovery log can be considered as an ever-growing sequential file. Every log record is assigned a unique log sequence number (LSN) when that record is appended to the log. The LSNs are assigned in ascending sequence and are typically the logical addresses of the corresponding log records, or timestamps which measure elapsed time from some beginning point.
The non-volatile or stable version of the recovery log is stored on stable storage such as rotating magnetic media ("disk"). Such stable storage can be improved by maintaining two identical copies of the recovery log on different disks. These on-line stable storage log records are then occasionally copied to a cheaper and slower archive medium such as tape. The recovery log archive records may be discarded once the appropriate image copy (archive dumps) of the database is produced, making the earlier recovery log archive records moot.
When a transaction or process failure occurs, the transaction is typically in such a state that its updates must be undone. It is possible that the transaction has corrupted some pages in volatile storage if it was involved in performing updates when the process disappeared. When a system failure occurs, the volatile storage contents are usually lost and the transaction system must be restarted and recovery performed using the stable storage versions of the database and recovery log. When a stable storage media or device failure occurs, the contents of the stable storage media are usually lost and the database must be recovered using the most recent image copy (archive dump) of the data object and the recovery log archive.
The UNDO records of a recovery log provide information on how to undo changes performed by the transaction. The REDO records of a recovery log provide information on how to redo changes performed by the transaction. In Write-Ahead Logging (WAL) based systems such as ARIES, an updated database is written back to the same stable storage location from where it was read. The WAL protocol asserts that the recovery log records representing changes to some data must already be in stable storage before the changed data are allowed to replace the previous version of those data in stable storage. That is, the system is not permitted to write an updated data page to the stable storage version of the database until at least the UNDO records of the recovery log describing the page update actions have been first written to stable storage.
Since a transaction includes execution of an application-specified sequence of operations, it is initiated with a special BEGIN transaction operation and terminates with either a COMMIT operation or an ABORT operation. The COMMIT and ABORT operations are the key to providing atomicity, as is known in the art. Transaction status is also stored in the recovery log and no transaction can be considered complete until its COMMIT status and all of its recovery log records are safely recorded on stable storage by forcing to disk all recovery log records up through the LSN of the most recent transaction COMMIT record. This permits a restart recovery procedure to recover any transactions that completed successfully but whose updated pages were not physically written to stable storage before system failure. This means that a transaction is not permitted to complete its COMMIT processing until all REDO records for that transaction have been written to stable storage.
For systems with large amounts of data, image dumps to archival media may only be taken infrequently. The number of recovery log records that must be applied to recover forward of an image dump grows with time and eventually become quite large. The recovery log archive itself may grow so large as to require inconvenient amounts of offline log archive storage. The size of the log archive, and consequently the media recovery time, can be reduced by compressing (filtering) extraneous data from the log records during the log archiving process. In particular, as is well-known, the UNDO data can be discarded from the recovery log archive if it is reliably assumed that the on-line stable store portion of the recovery log will be available for recovery.
For the case of a catastrophic media failure in which the system must be completely restored from offline storage, this on-line recovery assumption is unacceptable. Clearly, even if all UNDO records are present in the log archive, the database can only be restored up to the most recently written log archive record. However, it is better to recover to a transaction-consistent state while losing some recent transactions than to recover to an inconsistent and possibly corrupt state that may itself cause the database system to crash. The fundamental problem in the art is that if the UNDO records are discarded during log archiving, it is not generally possible to recover the data object in a transaction-consistent state from archives alone. This is because the usual corrupt data object image dump is not archived in a transaction-consistent state and cannot be brought into a consistent state without performing UNDO operations for the transactions that were active at the time of the image dump.
The recovery of such an archived image dump is referred to as forward recovery. Forward recovery is similar, but not identical, to crash recovery, which is the process of recovering a data resource to a transaction-consistent state after a system crash by applying the recovery log to the data resource itself as it existed at the time of the crash. The "crashed resource" is also generally in a corrupt state at the time of crash and must be purged of incomplete transactions through the use of the on-line recovery log records. One difference is that the amount of recovery log that must be processed during crash recovery can be minimized by making frequent checkpoints during normal forward processing, whereas the amount of recovery log that must be processed during forward recovery from archives can be very large, motivating the log filtering schemes mentioned above.
Transaction-consistent forward recovery from a damaged recovery log is accomplished by applying to an image dump resource copy the recovery log records starting with the record corresponding to the time of the image dump and processing forward to some record before but in the vicinity of the earliest damaged log portion. If the recovery log archive has been filtered of all UNDO records, the incomplete transactions cannot be backed out, preventing transaction-consistent recovery from archived records alone.
For this reason, the extensive related data object recovery art uses unfiltered recovery logs. Such art is ineffective for resolving the forward recovery archive filtering problem. Refer to U.S. Pat. No. 4,648,031 issued to Jenner, U.S. Pat. No. 4,507,751 issued to Gawlick, et al, and U.S. Pat. No. 4,945,474 issued to Elliott, et al. These practitioners disclose various techniques for compressing the recovery log and for efficient application of a recovery log to the corrupt data object but none address the problem of forward recovery from filtered log archives.
In U.S. Pat. No. 4,878,167, Kapulka, et al describe a method for constructing a filtered "resource recovery" log for accomplishing forward recovery. Kapulka, et al describe a method for constructing a filtered log that supports forward recovery from an undamaged log but their filtered resource recovery log does no contain sufficient information to permit transaction-consistent recovery at any point from a damaged log.
Thus, a problem felt in the art is the need for an efficient recovery log archiving protocol that will guarantee transaction-consistent recovery from data resource image dump and recovery log archives alone without archiving every UNDO log record for the entire database system. This unresolved problem is clearly felt in the art and is solved by the present invention in the manner described below.