Transactions are assumed to have ACID properties. More specifically, the transactions should be atomic, consistent, isolated, and durable. Database mechanisms enforce atomicity, isolation and durability, the “AID” properties. User written transaction code is responsible for consistency (the “C”), which means that a transaction should take a previously consistent state and update it creating a new consistent state. Unfortunately, user transactions can be flawed and lead to inconsistent, invalid or corrupt states.
One strong reason for utilizing database systems is their promise to guard the integrity of the data they store. Database systems implement transactions to provide atomicity with its promise of all or nothing execution, hence preventing partial transaction executions in which, for example, money intended to be transferred between accounts is only withdrawn or credited, but not both. Database systems implement redo recovery and forced logging so that once the database responds accepting responsibility for a transaction's updates, those updates are guaranteed to be included in the database state. In other words, the updates are durable. Such systems also implement isolation so that that the effects of one transaction do not interfere with the effects of another, thereby providing the illusion that transactions are executed serially in order. However, these commonly known techniques do not directly deal with the problem of data corruption.
Data can be corrupted in many ways. One way data can be corrupted is for a disk to fail either catastrophically or with a soft failure where some bits are lost. This is called media failure and dealing with it is referred to as media recovery. There are a number of approaches for database systems to provide media recovery. For example, replication, either using mirrors or some form of RAID (Redundant Array of Independent Disks) may provide enough redundancy for the corrupt or bad data to be reconstructed. More classically, database systems generate regular backups, a special form of replica optimized for high-speed writing and reading (e.g., tape backup). Since transactions can execute and commit between backups, media recovery involves loading the backup (called restore) and applying a redo recovery using a media recovery log. The media recovery log is a special log that records all transaction updates since the backup was taken. The media recovery log can be used to roll forward the restored backup. However, this process is arduous and usually results in a rather long outage.
Nevertheless, media recovery does not directly deal with the problem of erroneous transactions. The damage done by erroneous transactions is particularly pernicious because not only is data written by these transactions corrupted, but data written by all transactions that have subsequently read this data are likewise corrupted.
One technique used to eliminate data corruption induced by erroneous transactions is based on the media recovery technique described supra, where a backup is restored and a media recovery log is utilized to roll the database state forward. Blindly applied, however, media recovery would simply reconstruct corrupted data. To prevent such an occurrence, the media recovery process is conventionally halted before the current state of the database is recovered. More specifically, media recovery is permitted to continue until just before the corrupting transaction executed. This is called point in time recovery, and it does indeed remove the effects of the corrupting transaction.
Unfortunately, point in time recovery systems are a heroic measure both in terms of cost to use it and the impact that it has on a database, its users, and those responsible for managing the database. Point in time recovery is costly to perform in that it introduces a long outage while the backup is used to restore the database and the media recovery log is used to roll the state of the database forward to a desired time just prior to the corrupt transaction. Accordingly, this can seriously impair database availability. Moreover, all transactions that have committed later than the corrupting transaction are de-committed. In a high performance transaction application, this can result in hundreds even thousands, of transactions being de-committed. These transactions are subsequently re-submitted for execution in some manner to limit the damage caused by the corruption. This can be a very laborious process at least in part because re-executing any of these transactions might result in different results than their original execution.