Enterprises use computer databases to store, organize, and analyze some of their most important information. For example, a business may employ a database to warehouse its sales and ordering information so that analysts can predict trends in product sales or perform other kinds of data mining for long-range planning. Because database systems are responsible for managing information vital to the organization's operation, it is crucial for mission-critical database systems to implement mechanisms for recovery following a database system failure.
Recovery from all but the most serious kinds of failures generally relies on periodic backups, which are done to save the state of the database to a longer term storage medium, such as magnetic tape or optical disk. Because users continue to modify the database since the time of the last backup, the users' committed transactions are recorded in one or more “redo logs” on disk. Thus, to recover from a system crash, the periodic backup is used to restore the database system, and the committed transactions in the redo logs are reapplied to bring the database up to date to the state at the time of the system crash.
Some failures are more serious, however. For example, a hard disk that stores the contents of a database can be unreadable after a head crash. Earthquakes, fires, floods, tornadoes, and other acts of God can physically destroy the disk upon which the redo logs are saved. In these cases, the modifications and updates to the database after the last backup are permanently lost. Thus, for mission-critical database systems, a more robust approach for disaster recovery is needed. Moreover, restoration of a database from backups and redo logs is a time consuming process, and some enterprises cannot afford the necessary downtime. For example, one study has estimated that the average impact of one hour of downtime can run as high as $6.5 million in the retail brokerage industry.
Accordingly, there has been much interest in implementing disaster recovery by deploying a “standby” database system that is a replica of the business's primary database system. The standby database is typically created from a backup of the primary database, and the primary database and the standby database coordinate with each other such that the standby database keeps up with changes made on the primary database in the standby's own logs. In the event of an irrecoverable crash or other disaster, the standby database can quickly be activated (“failover”) to become the business's new primary database by applying all the remaining unapplied transactions without having to wait for restoring the primary database from the last backup and redo logs. Even though the downtime for failover of a standby database is much less than that of restoration from backups, there is still an urgent need for reducing the amount of unplanned downtime.