This invention relates to the field of database management systems, and in particular to a method and system that assures that all committed transactions survive permanently, regardless of crashes or other interruptions.
A fundamental requirement for a reliable database management system is “durability”, the ability of the database system to recover from crashes or other interruptions in a consistent state. For example, if a user executes a transaction that changes one or more objects (record) in the database and a crash or other interruption occurs during the execution of the transaction, the user should be able to determine whether the transaction was performed (‘committed’), and be assured that any changes that were made before the transaction was completed are not recorded in the database. In like manner, if a user interrupts the execution of a transaction by issuing an ‘abort’ command, the user should be assured that any changes that may have already been made before the abort command was received are not reflected in the database.
In many database systems, including IBM DB2 and Microsoft SQL Server, the ‘ARIES’ (Algorithms for Recovery and Isolation Exploiting Semantics) system is used to provide such durability. ARIES uses ‘write-ahead’ logging to record each change to an object in the database before the change is actually implemented. If the system is interrupted, the system restores a prior version of the database and processes the write-ahead log to recreate the database with all of the changes that had been submitted. The system also identifies incomplete transactions and un-does each of the changes associated with these incomplete transactions.
The ARIES write-ahead log records the start of the transaction and each of the changes caused by the transaction. As each change is recorded in the write-ahead log, it is subsequently executed. Because multiple transactions may be processed concurrently, each change is assigned a unique sequence number, and the record of each change in the write-ahead log includes its sequence number, an identifier of the transaction, the page of the database that is affected, and its prior sequence number. The prior sequence number facilitates un-doing changes in the event the transaction is aborted or a crash occurs before the transaction is committed. After the last change of the transaction is submitted, an end-of-transaction record is recorded in the write-ahead log, and the transaction is deemed ‘committed’.
Although the ARIES technique provides a reliable means for providing consistent database recovery, it is structured based on the paradigm of a disk-based database system, and may not be optimal for in-memory database systems. For example, ARIES uses the concept of disk caching, wherein pages of the database on the disk are loaded as required into local memory, modified, then written back to the disk periodically. This concept requires maintaining a “Dirty Page Table” that identifies each of the changes that have been applied to the pages in local memory but not yet written back to the disk, and the recovery process must account for these changes as well.
Additionally, the use of a write-ahead log that records each change before it is performed, then commits the transaction after all changes are performed, may be inefficient for use in an in-memory database system, particularly with respect to having to undo the changes written to the log during an incomplete or aborted transaction when recovering the database.
It would be advantageous to provide a durability scheme that is optimized for an in-memory database system. It would also be advantageous to provide a durability scheme that does not incur the overhead associated with a page-based recovery technique. It would also be advantageous to provide a durability scheme that does not incur the overhead associated with a write-ahead log and/or the overhead associated with undoing changes for incomplete transactions by undoing the changes recorded in the write-ahead log.
These advantages, and others, can be realized by a durability implementation that records only committed transactions in a log file. A pair of log files and a pair of snapshot files are maintained. Committed transactions are stored in a ‘current’ log. When a snapshot of the database is completed, the ‘current’ log becomes the ‘prior’ log and the other log becomes the ‘current’ log. After the next snapshot is completed, the prior log is deleted, the current log becomes the prior log, and the prior snapshot can be replaced by the next subsequent snapshot. Transactions that are not committed are not recorded in the current log, thereby avoiding the need to undo aborted transactions. If a given change is reflected in a completed snapshot, it does not appear in either of the logs; if the change is not yet reflected in a completed snapshot, it is guaranteed to be stored in one of the logs. During recovery, the system assesses both snapshots. The most recent of the completed snapshots is used, and the corresponding log is applied.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.