It is possible for an application program, a user, a system operator, or some other entity to apply erroneous operations to a database, thus corrupting it. Once corrupted, the database is useless to applications since the data stored in the database that is used by the applications to make decisions and to provide services cannot be trusted. Means must be provided to return the database to a prior or current correct state so that application processing can continue.
A. Computer Applications
Data-processing applications form the basis for much of our daily activity, from business to entertainment. Most applications are implemented as programs running in a computer. In many cases, an application depends upon a database of information that the application maintains to record the current state of the application. Typically, the information in the database is fundamental to the operation of the application, to the decisions it makes, and to its delivery of services to the end users.
The application's end users may include people, other applications, devices, and other systems. In this specification, the term “end users” means any entities that can influence an application and/or can use the services that it provides.
The fundamental components of an application are shown in FIG. 1. The application comprises a database and a program that is running in a computer. The database may be stored in persistent storage such as a disk for durability, it may be stored in high-speed memory for performance, or it may use a combination of these storage techniques. The database may be resident in the same computer as the application program, it may be resident in another computer, it may be implemented as an independent system, or it may be distributed among many systems.
A database generally includes several files or tables, though it may be just a random collection of unorganized data. Each file or table typically represents an entity set such as “employees” or “credit cards.”
With reference to FIG. 1, the application receives inputs from certain end users (1). It processes these inputs and may make certain modifications to its database as a result (2). Database modifications are made via DML and DDL commands. DML commands modify the contents of the database. Examples of DML commands are insert a row, update a row (modify its contents), and delete a row. DDL commands typically modify the structure of the database. Examples of DDL commands include insert or delete a table and insert or delete a column in an existing table.
The application can read the contents of rows in its database (3). As part of its processing of an incoming event, it may read certain information from its database to make decisions. Based on the inputs it receives from its end users and the data in its database, the application delivers certain services to its end users (4). A service may be delivered as the result of a specific input from an end user, such as providing an account balance in response to an online banking query.
Alternatively, a service may be delivered spontaneously by the application program, such as on a timed basis or when certain conditions occur. For instance, an alarm may be generated to operations staff if the load being carried by an electric-power transmission line exceeds a specified threshold.
The end users providing the input to the application may or may not be the same end users as those that receive its services.
B. Database Corruption
Erroneous information in the database can wreak havoc on the recipients of the application's services. Credit cards cancelled in error prevent cardholders from making purchases. Incorrect banking account balances can cause unintended overdrafts or credit denials. Erroneous medical records can cause improper treatment of patients and, in extreme cases, even death. Degradation of services range from severe disruptions such as these to the mundane, such as providing a theatergoer with the wrong times for a movie.
Errors can be introduced into a database from several sources, including a faulty application, a user error, the faulty action of a system administrator, a hardware fault, or even employee malfeasance. Some real-life examples of database corruption follow:
1. A large payments processor of credit cards and debit cards receives lists of cards (card account numbers) to be deactivated from banks that issue the payment cards (the issuing banks). These are cards that have been lost, canceled, stolen, or that have suffered fraudulent transactions. On at least one occasion, an issuing bank sent the wrong card file to the payments processor. The file contained all of the bank's payment cards, including both the good cards as well as those to be deactivated. The payments processor processed the file and deactivated every active card that had been issued by the bank. The payments processor had to roll back all of the card deactivations.
2. An application programming error by PayPal prevented eBay merchants from fulfilling orders (“PayPal Services Downgrade with Upgrade,” Availability Digest; June 2008). Orders were accepted from eBay customers and were entered into the database as legitimate orders, but they were not applied to the database tables for the merchants. Therefore, the merchants did not know about the orders. Customers paid for their purchases through PayPal but never received their goods. It took almost two weeks for PayPal to resolve the problem and to bring its database into a correct state so that it could properly process new orders. It had to roll back the customer orders, and it then had to reprocess and roll forward correct orders that updated both the Customer and the Merchant databases.
3. A major newspaper in the U.S. Midwest maintains a database of all of its reporters and the stories that they had written. One of the reporters was a Pulitzer Prize reporter whose major study was to be published in the next issue of the paper. Either by accident or by employee malfeasance, the reporter and all of his stories were deleted from the database. The newspaper had to recover his entries from magnetic tape backups and roll them forward into the database.
4. The $38 billion Alaska Permanent Fund database set up to distribute oil revenues to the citizens of Alaska was inadvertently deleted by a maintenance technician in July, 2006 (“The Alaska Permanent Fund and the $38 Billion Keystroke,” Availability Digest; April 2007). Unfortunately, the magnetic-tape backup copies of the database intended to correct such an error were unreadable. It took 70 people working days, nights, and weekends for two months to recreate the database.
5. In December, 2008, the IT manager of JournalSpace, a company providing a popular blogging web site, was discharged for embezzlement. Before leaving the premises, the manager deleted the web site's entire database, which included the blogs of thousands of users (“Help! My Data Center is Down: Part 6—The Human Factor,” Availability Digest; March 2012). Unknown to company management, the IT manager had elected not to maintain a backup copy of the bloggers' database. The company was unable to reconstruct the database and went out of business.
These are examples of major disruptions of application services caused by database corruption. However, less catastrophic database corruptions occur every day; and the databases must be restored to a correct state as quickly and as economically as possible.
C. Early Prior Art Recovery Methods
Several prior art database recovery methods have been and are being used.
i. Magnetic-Tape Backups
The classic backup method for databases since the early days of computing has been and still is the use of magnetic tapes to maintain a backup database copy that can be used to restore a damaged or lost database. Database backup copies are made daily or according to some other preset schedule.
If a database has to be restored, the magnetic tapes first have to be retrieved from storage, which is typically offsite so as to be protected from a data-center disaster. The magnetic-tape backup copies are then used to reconstruct the database. This process can take hours or days. Sometimes, it cannot be completed if one of the many backup tapes is unreadable. Even if the database restoration is successful, all of the data updates since the last backup are lost and have to be recreated through some other means, such as reentering them manually from hard-copy reports.
ii. Virtual-Tape Backups
Virtual tape has improved this process. Rather than writing the database backup to magnetic tape, tape images are instead written to disk. The disk storage may be located offsite to protect it from a data-center disaster. Because writing to disk is much faster and less manually intensive than writing to magnetic tape, backups can be made more frequently. It is typical to make virtual-tape backups every few hours rather than daily.
If a database must be restored from virtual tape, there is no delay such as that experienced when magnetic tapes are retrieved. Furthermore, the reliability problems with reading magnetic tapes are eliminated. Reading tape images from disk is much faster than it is from physical tape; so database reconstruction is much faster, often measured in hours rather than in days. However, all data updates since the last backup are still lost. This can be hours of data.
iii. Change Logs
Modern-day systems often maintain an independent log of all changes made to a database. The change log is usually resident on persistent storage such as disk so that it will survive a system failure. An important class of change logs is the transaction log.
The transaction log came with the advent of transaction processing. A transaction is a group of database operations that are either all made or that none are made. Modern-day transaction-processing systems use transaction managers that record every transaction in a transaction log to ensure the durability of transactions.
If there is no transaction manager, a change log may be created, for example, by the application, it may be built by an intercept library bound into an application, or it might be fed via triggers in the database that send database changes to the change log.
The change log can be used to recover transactions that are lost following the recovery of a database from a magnetic-tape or virtual-tape backup. Provided that the change log is still available (that is, it has not been lost due a system failure or deleted because it has passed its retention time), the transactions that have occurred since the last backup can be replayed from the change log against the restored database.
However, if the recovery of the database is due to the database being corrupted, the replay of transactions can only be made up to the point of corruption. Further replay will simply re-corrupt the database. Prior art methods of replay will bring the database to its correct state just before the corruption began. All changes made after the database corruption began will be lost.
There are other forms of the change log, for example, a log of all database changes made to a nontransactional database. In this case, the change log does not typically have transactional events such as begin work and commit work in it; in essence, each individual change event can be considered to be a committed event. The change log may be maintained by the database management system (DBMS), directly by the application, by an intercept library bound into the application, by database triggers, or by some other DBMS entity. Regardless, the changes contained in the change log can be used to restore a corrupted database up to the point of corruption using the methods described above.
iv. Mirrored Backups
For applications that are critical to a company, the integrity of the database is paramount. In many cases, these applications run on fault-tolerant systems. HP NonStop servers are an example of such systems. They are designed so that they recover instantly from any single hardware or software fault and often from multiple independent faults.
In systems such as HP NonStop servers, the databases are often mirrored. Two independent disks are configured for every database disk in the system. The application database is resident on both disks, and any change made by the application to the database is made simultaneously to both disks. Therefore, if one disk should fail, its mirrored copy is used to continue processing. When the failed disk is repaired, it is revived. That is, the contents of the good disk are copied to the recovered mirror; and the mirrored pair is then returned to service. The application typically can continue to run during the revive process.
Advantage can be taken of the mirrored pair if an application change that could potentially damage the database is attempted. To do this, the disk subsystem is unmirrored, leaving one copy of the mirrored pair offline and the other copy online supporting the application. The new application then can be run. If it should prove to be faulty and corrupts the database, the application's use can be terminated. The contents of the offline mirror are mounted to replace the corrupted disk or copied to the corrupted disk, and the system is returned to operation with the database recovered to the earlier unmirroring point, yet as a fully mirrored database. Since the previous good version of the application can be restarted and be running during the remirroring operation, this technique provides very fast recovery of the corrupted database.
Unfortunately, all changes made to the database during the trial operation of the new application are lost. The change log is generally unusable with an earlier version of the database and cannot be used to roll forward the changes since the unmirroring.
v. Data Replication
The introduction of data replication has minimized data-loss problems when certain types of failures occur. With data replication, changes to a production database are immediately transmitted to a target database as they occur on the production system. The target database is typically on a remote data-processing system that can run the application should the production system fail. Implemented properly, the remote system with its replicated database can take over the failed application functions in times measured anywhere from seconds to hours.
If a failure occurs at the source system, and if asynchronous data replication is used, in which a data change is sent to the target system after it has been applied to the production database, the amount of data lost is the data still in the replication pipeline. That data has not made it to the target system (W. H. Highleyman, P. J. Holenstein, B. D. Holenstein, Chapter 3, Asynchronous Replication, Breaking the Availability Barrier: Survivable Systems for Enterprise Computing, AuthorHouse; 2004). Data loss can be reduced to seconds or less with asynchronous data replication.
By using synchronous data replication, in which a data change is not made permanently to the production system until the target system has confirmed that it has safe-stored (or applied) the data change, a failure at the source system will result in no data being lost (W. H. Highleyman, P. J. Holenstein, B. D. Holenstein, Chapter 4, Synchronous Replication, Breaking the Availability Barrier: Survivable Systems for Enterprise Computing, AuthorHouse; 2004).
If both the production system and the target system are configured as an active/active configuration such that each is running the application, the two databases are kept in synchronization via bidirectional data replication. If a change is made to the database by either system, that change is replicated to the other system. Thus, a transaction can be sent to either system and can be properly processed (B. D. Holenstein, W. H. Highleyman, P. J. Holenstein, Chapter 3, An Active/Active Primer, Breaking the Availability Barrier II: Achieving Century Uptimes with Active/Active Systems, AuthorHouse; 2007). If one system fails, all that is required to continue full processing is to reroute transactions destined for the failed system to the surviving system. Thus, recovery from a system failure can be accomplished in seconds or less.
However, data replication does not solve the database-corruption problem. Usually, if a problem such as an application error, user error, or operations-staff error has corrupted the source database, the errors will be replicated to the target database so that it also becomes corrupted.
D. Recent Prior Art Recovery Techniques
Historically, prior art recovery of a corrupted database required a previous backup copy be taken with the application offline (or in read-only mode); and the recovery operation consisted of stopping all application processing and restoring the database from the backup copy. Over time, enhancements were made to this general approach, allowing the application to remain active for updating while the backup was taken or the recovery was performed. However, any database changes made since the backup was taken were typically lost. Additional improvements were made to address this data-loss issue, such that changes made to the database since the backup was taken were reapplied against the restored copy to “roll forward” the database to a particular point-in-time. This was usually accomplished by reading the change log for changes made since the backup was taken and reapplying the changes contained in it in ascending order. For example, this approach is common in the HP NonStop IMF environment to recover a lost or corrupted file.
The recovery of corrupted databases in many subsequent prior art methods has depended upon the availability of a log of all changes made to a database. By using this log, updates in error can be backed out and replaced with correct values.
In many application implementations, a current log of all updates made to a database is maintained in a change log of some sort, as shown in FIG. 2. When a modification to the database is made, such as inserting, updating, or deleting a row, that change is made not only to the database (1) but is also recorded in a change log (2). In some instances, the change log is created by the application. In others, it is created by database triggers that are activated whenever a change is made to the database. Often, the change log is created by a transaction manager, as described next.
Many modern-day data-processing applications are typically transaction-oriented. A transaction is a set of database changes that are treated as an atomic entity—either all changes within a transaction are applied to the database, or none are (Jim Gray, Andreas Reuter, page 6, Transaction Processing: Concepts and Techniques, Morgan Kaufmann; 1993). A transaction sequence is typically bounded or delimited by an initiating begin-transaction command (which may be implicit in the first update of a transaction) and a terminating commit or abort command. A commit command indicates to the database that the transaction should be applied to the database. An abort command indicates that the transaction should be ignored.
A transaction manager that controls the applying of transactions to the database is responsible for the durability of those transactions should a system failure occur. To fulfill this responsibility, the transaction manager typically maintains a durable change log in persistent storage such as disk. This is the change log referenced above. Examples of such change logs are the Audit Trails used by HP Nonstop systems for their Enscribe, SQL/MP, and SQL/MX databases and the Redo Logs for Oracle databases.
E. Recent Prior art Recovery Techniques for Rollback Database Recovery
Prior art implementations for roll-back database recovery use the change log as an important component of database recovery. These implementations include, as examples, Oracle Flashback, Oracle GoldenGate Reverse, and the Gravic Shadowbase data replication engine.
i. Oracle Flashback
Oracle Flashback is a facility provided by the Oracle database manager to restore a database to a known point in time [Section 5.1, “Restore Points and Flashback Database: Concepts,” Oracle Database Backup and Recovery Basics, 10g Release 2 (10.2), Part Number B14192-03]. The Flashback architecture is shown in FIG. 3.
Oracle normally maintains a Redo Log (1) of all changes to the database. For each change, the Redo Log includes an entry describing the effects of the change on the database. Oracle enters into its Redo Log the before image of any data block that has been deleted, the after image of any data block that has been inserted, and both the before and after images of a data block that has been updated.
If Flashback is enabled, Oracle also maintains a Flashback Log (2). The system administrator enters aflashback retention target, which is the earliest time to which Flashback must be able to restore the system. From that time on, Oracle will enter into its Flashback Log the before image of any data block that has been changed.
To return the database to a specific point in time, the administrator will enter a restore time. Flashback will follow the Flashback Log from the restore time forward. When it finds the first entry for a data block, it will restore the data block's image in the database to the before image contained in that Flashback entry (3). It will then ignore any further before images for that block. When Flashback has finished passing through its Flashback Log, it has returned all of the modified data blocks in the database to their states at or before the restore time.
Flashback also provides a facility for restoring the database to a known point in time prior to a risky event. This is useful, for instance, to protect the database from maintenance functions that may corrupt the database or from the trial of a new version of an application. To use this feature, the system administrator establishes a normal restore point by specifying either a restore time or an Oracle transaction-identification number (the SCN, or system change number) to which the database is to be restored if there is a problem. In this case, Oracle also will create a Flashback Log; but the Log will contain only the first change to a data block. Changes after the first change to a data block are not entered into the Flashback Log.
If a problem requires the database to be returned to its state at the normal restore point, the Flashback facility reads and applies all of the before images in its Flashback Log. When Flashback has completed this, the database has been restored to its beginning state.
If desired, once a database has been returned to a valid and correct state, Oracle can use the contents of the Redo Log to replay (or roll forward) the transactions contained in it against the database to bring it to the desired state (4).
ii. Oracle GoldenGate Reverse
The GoldenGate Reverse utility uses before images to undo database changes for specified tables, rows, and time periods. [Chapter 19, “Undoing data changes with the Reverse utility,” Oracle GoldenGate Windows and Linux Administrator's Guide, 11g Release 2 Patch Set 1 (11.2.1.0.1), E29397-1]. It enables the system administrator to perform a backout for selected files or tables, unlike other methods that require restoring the entire database.
The system administrator can use the Reverse utility for the following purposes:
1. To restore a test database to its original state after a test is run. Because the Reverse utility only backs out changes, a test database can be restored in a matter of minutes and much more efficiently than a complete database restore, which can take hours.
2. To reverse errors caused by corrupt data or accidental deletions. For example, if an UPDATE or DELETE command is issued without a WHERE clause when one should have been supplied, the Reverse utility reverses the operation.
To use the Reverse utility, the following operations are performed (FIG. 4):
1. Run the Extract utility to extract the corrupted data (1). The input to the Extract utility is a “data source,” which is either the source database itself or a trail of changes made to the source database. The user adjusts the starting and ending time of the Extract utility parameters to position into the data source to extract the changes to be reversed. The output of the Extract utility is a disk file (called the “Input File”) of the changes that are to be reversed, listed in the original order in which they occurred.
2. Run the Reverse utility to perform the reversal of transactions (2). The input to the Reverse utility is the output of the Extract utility. The Reverse utility reads the Input File and physically reverses the events in it, converting transaction and database change events as described below. The output of the Reverse utility is a file (called the “Output File”) of the reversing events in time-reversed order. The Output File is organized such that the Replicat utility can read the Output File in ascending order to apply the events in it against the database to reverse the original database changes.
3. Run the Replicat utility to apply the restored data to the source database (3).
Note that this approach requires several processing passes to produce the database changes to be reversed, requires two additional file-based copies of the changes to be reversed (beyond the original set of database changes), and requires several writes and reads from interim database files to perform the ultimate reversing operations.
More specifically, it has to start a process (the Extract Process), position into the data source, perform the extract of selected events, read the extracted events, and write them into the Input File. It then has to start another process (the Reverse Process) to position into the Input File, perform an extract, read the selected events, compute the reversal events, and write the reversal events into the Output File in reverse time order. It then has to start another process (the Replicat Process) to position into the Output File, perform an extract and read, and write (apply) the reversal events against the data source (or target database).
The Reverse approach produces several interim copies of the events to be reversed. More specifically, with the Reverse approach, the database changes to be reversed exist initially in the data source; Extract creates an interim copy and stores it in the Input File (in the original event order); Reverse creates another copy and stores it in the Output File (modified for the reversal operation and stored in reverse order to the original database change order). Hence, three copies of the database changes to be reversed are needed—the original set of changes to be reversed contained in the data source along with two additional copies that Reverse creates. For active data sources with many changes, these copies also will be large files.
The Reverse utility reverses the forward operations by:
1. Reversing the ordering of database operations in an extract file, a series of extract files, or a trail so that they can be processed in reverse order. This guarantees that records with the same key are properly applied.
2. Changing delete operations to inserts.
3. Changing inserts to deletes.
4. Interchanging before images and after images in updates.
5. Reversing the begin and end transaction indicators.
iii. Gravic Shadowbase Data Replication Engine
The Shadowbase data replication engine from Gravic, Inc., of Malvern, Pa., also can be configured to provide database recovery. The Shadowbase data replication engine is shown in FIG. 5. In HP NonStop systems, the Shadowbase replication engine uses the NonStop TMF Audit Trail as the change log (1). A Shadowbase Collector on the source system (2) follows the Audit Trail and sends database changes to a Shadowbase Consumer on the target system (3). The Consumer applies the database changes to the target database (4).
The application of random changes to a database is a lengthy process. Therefore, it is typically the Consumer that limits the replication capacity of the Shadowbase replication engine. To alleviate this problem and improve the overall replication-processing throughput, the Shadowbase replication engine can be configured with one or more Consumers driven by one or more Collectors communicating over one or more communication channels (W. H. Highleyman, P. J. Holenstein, B. D. Holenstein, Chapter 10, Referential Integrity, Breaking the Availability Barrier: Survivable Systems for Enterprise Computing, AuthorHouse; 2004).
When used to recover a corrupted database on a single system that is not in a replication environment, the Shadowbase replication engine is configured on the system hosting the database, as shown in FIG. 6. A special user exit (a “user exit” in a Shadowbase replication engine is a code fragment that is embedded into the replication engine to enhance its normal processing for the events being replicated) is written for the Shadowbase Collector (1), for the Shadowbase Consumer (2) or for both to filter the Audit Trail records (3). Only those records that are necessary and desired for database restoration are retrieved.
The Shadowbase Collector follows the Audit Trail from the point of desired recovery and sends the selected change events to the Consumer. Using the logic of the user exit, either the Collector or the Consumer or both cooperatively look for the first change to a data row. The before image of that change is applied to the application database (4), thus backing out the erroneous contents. No other changes to that specific row will be selected or applied.
More specifically, if the first change event was an insert, the row is deleted; if the first change event was a delete, the row is reinserted; if the first change event was an update, the before image of the update is reapplied. Using this processing, if an insert is being applied, and the row already exists, the insert will be mapped to an update; and the update will be applied. If a delete is being applied, and the row does not exist, the delete event is discarded; and processing continues. If an update is being applied, and the row does not exist, it is mapped into an insert; and the insert is applied. If an update is being applied, and the row already exists, the before image of the update is reapplied, as stated above. Changes to a data row following the first change after the restore time are ignored.
When the Shadowbase replication engine has completed its processing of the Audit Trail, the application database has been restored to its correct state at the designated restore time.
With Shadowbase database recovery, it is unnecessary to reconstruct the entire database. Often, only a portion of the database has been corrupted by an application or by an end user. The system administrator can specify that only changes made by a specific entity or matching certain criteria be rolled forward. For instance, recovery can be limited to those changes made by a particular application or by a particular person or only those changes made to a designated table or range of rows in a designated table. These criteria are then built into the recovery user exits implemented for this purpose.
In the following figures and descriptions, the Shadowbase Collectors and Consumers are depicted without the user exits for graphical simplicity. However, it is understood that the Collectors and Consumers may have user exits embedded into them even if not shown or described.
iv. Gravic Shadowbase Enhanced Data Replication Engine
An enhanced version of the Shadowbase data replication engine can dramatically reduce the amount of data lost following a source-system failure due to data in the replication pipeline failing to make it to the target database. It can also speed up the restoration of a corrupted database.
In the basic Shadowbase data replication engine shown in FIG. 5, data replication is incomplete until the Consumer has applied the database update to the target database. Applying random updates to a database is a lengthy process, whereas reading a sequential file as the Collector does with the Audit Trail is a relatively rapid process.
By adding an additional queuing component, one also can dramatically improve the delivery of the data to the target environment. This is useful as it divorces the delivery of the data from the replaying of the data and allows each to run at its own inherent speed by using intermediary queues to hold the data until it can be replayed. This is particularly useful in business-continuity environments, as it lessens the amount of data loss should a disaster strike the source environment. This architecture is shown in FIG. 7.
It is much faster to write rows sequentially to an intermediary queue (shown as a disk-based “Queue File,” although it could be memory-resident) than it is to apply them randomly across the database as the Consumer has to do. This is because the rows can be blocked (or batched), and many rows can be written as a single block (or batch) to the Queue File. Furthermore, if the disk is not used for other purposes, after the write completes there is no need to move the disk head, as the next write will be an append. Therefore, each write is faster because it does not have to wait for the disk head to be repositioned—an operation that is much lengthier than simply having to wait for the disk to rotate to the proper sector for the write.
The enhanced Shadowbase queuing mechanism works as follows. With reference to FIG. 7, the application makes changes to the application database via inserts, updates, and deletes of rows (1). These changes are also recorded in the application's change log [for instance, the NonStop TMF Audit Trail (2)].
A Queue File on the target system (3) temporarily holds changes that have been replicated to the target system. The Queue File comprises a memory buffer into which replicated changes are stored while they await processing by the Consumer. If the memory buffer should fill, it can overflow to a disk file.
The Shadowbase Collector on the source system follows the Audit Trail (4) and sends changes (5) to a Queue Manager (QMGR) (6) on the target system. The QMGR writes the changes into the Queue File on the target system (7) and passes them to the Shadowbase Consumer(s) (8) for applying to the target database (9).
If the Consumer(s) should fall behind the replication stream, rather than holding up replication and increasing the possibility of lost data should the source system fail, changes are held in the Queue File until the Consumers have time to read them from the Queue File (10) and to apply them to the target database. Thus, replication is delayed only by the time to write changes sequentially into a memory buffer rather than the time that is required to apply changes randomly to the target database. Consequently, data loss due to a replication channel failure is greatly reduced.
Recovery of a corrupted database to a prior point-in-time by the enhanced Shadowbase replication engine is similar to that of the basic Shadowbase replication engine shown in FIG. 6 and described in the earlier section entitled “Gravic Shadowbase Data Replication Engine”, except that the Queue file is used instead of the Audit Trail for recovery purposes. The recovery of a corrupted target database in a replication environment is enhanced by this configuration. If the target database is on a separate system from the source database, the enhanced Shadowbase replication engine has access to a local copy of the changes on the target system via the Queue File. It does not have to access changes from an Audit Trail on the remote system. Thus, communication time is saved and file restoration proceeds at a faster pace.
Furthermore, a corrupted target database can be restored even if the source system is not available due to a source-system failure or a replication network failure. This is important in the event that the source system fails and the target system must take over application functions with a corrupted database.
F. Recent Prior Art Recovery Techniques for Roll-Forward Database Recovery
The prior art has several ways that are used to restore a corrupted database using a roll-forward approach. One is to make a copy of the database on magnetic tape or disk prior to initiating processing with the upgraded system. If problems occur, the system infrastructure is returned to its configuration before the upgrade. The database is loaded with the copy to restore it to its state prior to the upgrade, and the change log is used to roll forward the subsequent changes that were made during processing by the upgraded system. An example of this implementation is HP's NonStop TMF Transaction Management Facility.
This technique presents some significant challenges. One challenge is the amount of data that needs to be processed to perform the restore operation. Typically, the entire file or table has to be restored to the prior backup (or online dump) point even if only a small amount of the file or table was subsequently changed. This can involve the recovery of massive amounts of data from the backup medium and can require access to hundreds of tapes or disk packs.
Yet another challenge is the time it takes to copy and to restore a large database from magnetic tape or disk. Many application databases today are massive, often measuring several terabytes in size. Even backing up and restoring from high-speed disk rather than from magnetic tape can take hours if not days. During this recovery time, the application is typically down.
Attempting to restore the database using a prior art approach such as the Oracle Flashback capability that recovers to a previous point-in-time from just the change blocks can take a very long time to complete if there has been a lot of update activity against the database since the restore point was taken (see the section entitled “Oracle Flashback”). Worse, the application is often unavailable during the process since the database typically must be taken offline during the operation.
Another problem is that the roll-forward capability provided by many transaction managers does not allow selective roll forward. All of the changes that are in the change log are applied to the database to bring it to a current state. This can result in the transaction manager rolling thru the corrupting changes and reapplying them, causing the newly restored database to also become corrupted. It is also possible that roll forward will be unsuccessful, as many of the database changes made during the aborted processing may have depended upon a new infrastructure and hence cannot be applied to the database copy. If this is the case, the result is that the roll-forward operation is aborted by the transaction manager. It cannot be completed.
The time challenge can be improved in the prior art via another recovery method. For example, in fault-tolerant systems such as HP NonStop servers, the disks are mirrored. Every logical disk actually comprises two disks, and all updates are applied to the two disks (the mirrored pair) simultaneously.
In this case, prior to running the upgraded system, the mirrors are split. One mirror is used by the upgraded application, and the other mirror is saved as a ‘fast-recovery’ pre-upgrade version of the database. Splitting a mirror and attaching the application to one of the mirrors is a relatively fast operation. A third mirror can be attached to the database and revived as the application runs so that the application is once again running against a mirrored database after the upgrade. This helps protect the upgraded system from subsequent media failure causing a system outage.
Mirrored disks may either be physical disks or logical disks. In many systems, a storage-area network (SAN), such as HP's XP storage disk array, creates logical disks that appear to an application to be an independent physical disk. Using SANs, applications may have any number of mirrored copies of a database without having to purchase additional physical disks.
The technique for using a split mirror for database recovery is illustrated in FIG. 8. If it is anticipated that a critical update may lead to database corruption, the mirrored database disks are unmirrored, and one mirror is saved as a recovery point (1). At this point, the database can be remirrored for protection against a disk media failure. The critical updates are then made (2). If the updates appear to not have caused database corruption, an optional updated mirror may be created as a new recovery point (3). This may either be done by splitting the current mirrored database (and optionally remirroring it), by reviving the current database split mirror (that is, synchronizing it with the updated database), or by creating another mirror, reviving it, and splitting it as a second recovery point.
At this point, the application is started (4). After running for a suitable time (5), if no database corruption is found, the split-mirror recovery point(s) are no longer needed and the disks holding the recovery-point mirrors can be released. However, if corruption is found (6), the application is stopped (7). The source of the corruption is determined, and a decision is made (8) as to whether to revert to the first recovery point or to the second recovery point (if one was taken). If the corruption was caused by the critical updates, the first recovery-point mirrored disk is restored (9). If it instead the corruption was caused by database updates subsequent to the critical updates, the second recovery-point mirrored disk is restored. At this point, the entire process of FIG. 8 is repeated. (10).
Thus, the time to save the initial version of the database and then to restore it to the application in the event of a problem is very small. Databases can be remirrored (revived) while the application is running so that the database copy and restoration activity does not significantly impact application availability.
However, the saved mirror typically cannot be rolled forward because the change log is on the mirror that has been abandoned and is not generally accessible by the transaction manager. Even if the transaction manager could access the change log, the replay of the changes may not be possible because the state of the database may not be what the transaction manager or change log expects. Even if the change log is accessible, the transaction manager typically cannot selectively roll forward the change log changes, and the resulting database will match the corrupted database. Thus, the application database cannot be rolled forward to its current, correct state using this method. The application must reprocess all transactions that it had processed in its previous run.
G. What is Needed
As can be seen from the various discussions of prior art recovery techniques, they have several limitations. What is needed are methods to achieve the following database recovery goals:
1. Roll-forward should be selective. Rather than having to roll forward an entire table or file, only data within that table or file that is correct should be rolled forward. In the prior art, tables and files must typically be rolled forward in their entirety. Additionally, the roll-forward operation should allow the corrupted data to be (optionally) corrected and replayed.
2. Roll-back should be selective. Rather than having to roll back an entire table or file, only data within that table or file that has been corrupted should be rolled back. In the prior art, tables and files must typically be rolled back in their entirety. Additionally, the roll-back operation should allow the corrupted data to be (optionally) corrected and replayed.
3. Roll-forward should be as fast as possible to minimize application downtime. In the prior art, rolling a large database forward could take hours or days; and the application is typically out of service during this time.
4. Roll-back should be as fast as possible to minimize application downtime. In the prior art, rolling back a large database could take hours, and the application is typically out of service during this time.
5. Roll-forward should be able to be performed while the application continues to be active to minimize application downtime. In the prior art, the application typically must be shut down during the roll-forward process.
6. Roll-back should be able to be performed while the application continues to be active to minimize application downtime. In the prior art, the application typically must be shut down during the roll-back process.
7. The operation sequences required for roll-forward and the reversing operation sequences required for roll-back should be able to be created without making multiple disk passes through the change-log data.
8. Transactions should be able to be selectively rolled forward up through the most current transaction, not just up to the point that the database corruption began. Corrupted transactions should either be repaired and replayed or be rejected and not rolled forward.
9. The detection of database corruption should be automated so that corruption can be detected as quickly and as accurately as possible. In the prior art, the fact that the database has become corrupted may not even be detected until an end user notices the problem and calls the IT department's help desk.
10. The decision as to whether to correct a database by rolling it back or by rolling it forward should be automated so that this function can be completed as quickly and as accurately as possible. This is a complex function, and in the prior art the decision is made manually after exhaustive analysis. This analysis could take hours or days.
11. During their respective processing, the roll-back and roll-forward operation should at all times maintain the referential integrity of the database. There should never be a point in the roll-back or roll-forward sequence that leaves the database in an incorrect or inconsistent state. The reversing change that caused the referential integrity violation might be rejected by the database manager, leaving corruption in the database. In the prior art, many techniques for roll-forward and roll-back may violate referential integrity for short periods of time.
12. If a split-mirror recovery point has been taken before a critical update to a database, and the update results in database corruption, it should be possible to roll forward the recovery-point mirror to a correct current state by rolling forward only the correct changes that were made during the update and rejecting corrupting changes. Roll-forward typically requires the existence of a change log from which to obtain changes that have been made since the recovery point. In the prior art, the change log is not available to the recovery point mirror. It is contained on the corrupted mirror which is not accessible to the roll-forward utility. Therefore, the application must rerun all transactions that had been originally processed since the recovery point.
13. In a replication environment, means should be provided to remove corruption from the target database by rolling forward recent changes to a replicated copy of the database, even if the change log on the source system is no longer available due to a source-system failure or a replication network failure. This is particularly important if the target system is a backup system to a source production system, and the target system is called upon to take over processing from a failed source system. In the prior art, the change log is often stored on the source system and is unavailable to the target-system roll-forward utility if the source system is unavailable or if the replication network is down.