1. Field of the Invention
The present invention relates to database processing and, more particularly, to providing data integrity during a database failover.
2. Brief Description of Related Developments
Database systems often include a primary database and backup, or standby, databases. The standby databases can be located locally or remotely relative to the primary database. Log files are associated with each of the primary and standby databases to preserve the integrity of the data in the database system. Generally, log files are used to record data transactions applied to the primary database, so that if the primary database is damaged in some way, the log files can be used to restore the damaged or missing data to the standby database.
In a database system configuration containing the primary database and the standby database, and the standby database is recovered by applying archived redo log files from the primary database, the standby is always at least one log file behind the primary. The primary's redo data is written to the primary database's redo log file and the standby database's redo log file, and then the modifications are written to the primary database's data files. A new primary log file is begun when the primary redo log file becomes full, or as a result of a command issued by a database administrator (DBA). After such a primary log file switch takes place, the primary redo log file is archived both locally for the primary database and to the standby site for the standby database. The archiving can be done in parallel with the primary log file switch, if desired. However, the archived primary redo log file can't be applied until the primary log file switch occurs.
An apparent failure of the primary database may be caused by hardware problems, software problems, or loss of communications with the primary database site. In the event of failure of the primary database, the standby database will not be able to recover any redo transaction data contained in the primary database's redo log files as the primary database and associated log files are most likely unavailable due to the failure. The standby database is guaranteed to have a data loss, which could potentially be very large if the primary database's online logs were large and nearly full.
When a database suffers a catastrophic failure, and the data on the primary database is no longer considered accessible, a failover procedure can be used to replace the primary database with a standby database associated with the primary database. When executing a failover procedure, it is often essential that a transfer from one database to another database take place quickly. Essential business could be put on hold or lost while the database system is down. The switchover to a standby database should take place quickly. If the failover procedure fails, a manual recovery of one of the standby databases may be required. A manual recovery of a standby database is very manual-intensive and highly prone to incorporate human caused errors.
To provide some additional data security, some database systems use write-ahead logging. In write-ahead logging, transactions which are to be written on the primary database are first written to the primary database's online redo log files. The transactions are also transmitted either synchronously or asynchronously to the standby database's online redo log files. The contents of the redo log files are applied to the primary database only after the data has been successfully logged in both the primary database's current redo log files and standby database's current redo log files. Redo logs are archived once all the transactions in the redo logs have been written to the associated database.
During a failure of the primary database, the last transaction transmitted to standby redo log files may or may not have been written to disk at either the primary database or the standby database. Therefore, the standby redo log file may have more or less redo transactions than the primary redo log file. The use of synchronous transmission minimizes the possible difference of what was written to the primary and standby redo log files at the time the primary failed. However, it is impossible to guarantee that transmission occurred for all elements of a transaction, i.e. atomically, for both the primary and standby redo log files. The following table shows 3 possible cases of 2 transactions, A and B, simultaneously being written to the primary and standby redo log files when a failure occurs.
PrimaryStandbyCase1A,BACase2AA,BCase3A,BA,B
In case 1, the primary redo log file includes both transactions A and B, while the standby redo log includes only transaction A. If a redo is done with the primary redo log file, both transaction A and B will be applied, or possibly reapplied, to the primary database. A redo implemented with the standby redo log file will not apply transaction B. Therefore, the primary and standby databases will not have the same data.
In case 2, transaction B was successfully written to the standby redo log but not to the primary redo log. A redo implemented with the primary redo log file will not apply transaction B to the primary database, while a redo with the standby redo log will apply transaction B to the standby database. In case 3, both the primary and standby redo logs have the same transactions, so both the primary and standby databases will be consistent with each other after a redo is done.
As shown, depending upon the timing of the failure of the primary database, the transactions logged in the various log files may be different for each database in a database system. In order to begin the failover process, the DBA can select which redo log(s) are to be used for the redo procedure, and can also select which standby database, if there are more than one, is to be used for the redo procedure.
Once a log file has begun to be used to update a database, the update process should continue to a conclusion. A recovery should not restart using different redo log files because the current redo log files could be different between the primary and standby depending on the state of the transmission when the primary failed, as shown in the three cases. The last part of a redo could be in the primary's current redo log file but not written to the standby's redo log file before the primary failed. Moreover, the redo could have been written to the standby where it will be applied, but the primary failed before the acknowledgment was received at the primary.
In addition, the DBA may attempt to update the database with log files from different sources, as different log files can contain different transactions. However, a multiple log file update can cause serious database data integrity problems, as transactions may be applied multiple times. A transaction can often affect a number of database records and fields. Once the data in a database can no longer be relied upon, manual intervention is required to examine the data and transactions in the database to determine what data fields must be adjusted, which transactions must be backed out and which transactions are to be added. This manual recovery of a database is requires a large number of man hours, is very time consuming, and is highly prone to the creation of additional errors in the database and the log files, and therefore is to be avoided.
Existing systems are able to determine that a problem with a database exists after a database has been corrupted by an update. However, once the problem, such as with data integrity, has been determined, a great deal of time and manual effort is required to fix the database. Existing systems cannot guarantee that data integrity will be preserved in all fail-over and recovery scenarios. It would be advantageous to have a method for preventing such database updates before they occur, and prevent the need for a manual standby database recovery.