1. Field of the Invention
The present invention relates to database recovery using back up copies and change accumulation data sets. More specifically, the invention relates to database recovery by using complete or incomplete change accumulation data sets.
2. Relevant Technology
Management of extensive databases is of paramount importance for modern day society which depends on reliable storage of data reflecting critical information. Typically, databases must be constantly operational and available to users. Modern day database systems are substantially robust that they infrequently experience a failure. Nevertheless, when a failure does occur the database recovery must be performed efficiently and accurately to minimize loss to the users. Thus, database recovery is an operation which must be performed expeditiously in order to minimize down time for users. A database experiencing an extensive period of downtime may quickly create an economic disaster.
A database is managed by a complex database management system. An example of a database management system is the Information Management System (IMS) available from IBM Corp., Armonk, N.Y. The IMS system is used to serve a vast number of databases in operation today. The IMS system[s] allows access to one or more databases in order for users to interact with the data maintained on the database. The majority of user access involves transactional operations.
As users update the database data sets in the database, the database management system records the updates into a log data set. The log data set is an amount of data, such as a file, which reflects a series of updates to the database. Log data sets are recorded in sequential records which have defined open and close points.
Users may make backup copies or series of backup copies of the database periodically to assist in the recovery of a database. These backup copies may be recorded on tape archives by tape management systems. The backup copy is used as a base to restore the database to its state prior to a database failure. In recovery, subsequent updates to the database are applied from records on the log data sets. Recovery further requires storage of attributes of the database and the backup. Database management systems often include a data set for control of recovery which comprises several attributes of the database and the backup copy. Database management systems use some form of recovery control information recorded in this data set relating to the database and the backup copy to assist in recovery.
Database management systems include a recovery facility to respond to a database failure. Upon database failure, the recovery facility creates a new database and writes the backup copy to the new database. The recovery utility further applies all the updates to the database from when the backup copy was created. Information used to restore the new database from the last state of the backup copy may be taken from the log data sets and recovery control information.
To assist in database recovery a utility, referenced herein as a change accumulation utility, accumulates updates and places them in a change accumulation data set (CADS). The CADS is an accumulation of changes in the log records that apply to the new database and are used as input during database recovery. The CADS may reflect updates for more than one database. A typical database record is updated a portion at a time and there may be overlapping updates which makes the order of recovery important. The CADS receives the overlapping updates but, after all the changes, the CADS reflects only the final changes.
In order to create the CADS, the change accumulation utility reads log data sets sequentially, that is, one after another. Typically, users organize their multiple databases into change accumulation groups so that the change accumulation utility operates as efficiently as possible. A user can run the change accumulation process against one change accumulation group and use an optional secondary output—the set of log records that were not written to the change accumulation data set—as input to the change accumulation utility for the next change accumulation group to be processed. This can be done for each change accumulation group in which the current change accumulation run uses the secondary output of the previous change accumulation run. This serial process is managed directly by the user. Users usually run change accumulation periodically so that when a database data set in a change accumulation group requires recovery, the time required to run a final change accumulation job and subsequent database recovery job is minimized. As can be expected, this sequential recovery process is quite complex.
The recovery utility reads the entire CADS into memory and applies that portion of the CADS that is relevant to the database being restored. Each record has an identification that's sequential and the database data sets are restored in a sequential order. The recovery utility addresses each record in the CADS to see if there is a change in data for that record. If so, the CADS is accessed and the relevant record merged into the new database.
During routine operation, the database management system periodically creates updates in the database and in the log data set. Over time, several updates are created. However, the updates are not permanently stored in the database until the updates are physically written on the database. In general, database activity is based on being able to “commit” updates to a database. A commit point is a point in time where updates become permanent parts of the database. The span of time between commit points is referred to as a “commit scope” or “unit of recovery” (UOR). If something goes wrong, such as a write error to the database, and the updates cannot be made, all the updates produced since the last commit point are “aborted.” It is as if the updates never happened.
One method for implementing database updates and commit point processing is for the database manager to maintain the database changes in storage and not apply the changes to the databases until the commit point is reached. A copy of the database data that is changed is written to the log as the update is created. When the commit point is reached, and everything went as expected, the updates are written to the databases. If something went wrong, the storage containing the database updates is freed.
A common update to the database is a transaction which is a unitary logical piece of work that may include performing a variety of activities. At its simplest level a transaction may involve decreasing one account and increasing another account. The activities performed in the transaction may extend beyond a first commit point and will not be permanent until a subsequent commit point.
The change accumulation utility creates the CADS by taking log data sets that have been committed up to a certain commit point and combines them together. The committed log data sets are readily applied to the new database during recovery because they are permanent. Updates that occur after the last recorded commit point are not readily applied to the new database because there is no guarantee that the updates will be committed at a later commit point. Failure of a commit point results in an abort of the update and any related transactions. If the updates need to be aborted, the log record is retrieved and the copies of the unchanged database data are applied, in effect backing out the changes. Thus, updates that occur after the commit point are not necessarily committed to the database.
Each CADS comprises a detail record which is a record of committed updates from one or more logs. Each detail record is a series of contiguous bytes which can be overlaid into the backup copy of one database physical record. Applying all of the detail records in the CADS is equivalent to rerunning all of the transactions against the data base which were entered since a backup copy was made up to a “merge-end point.” The merge-end point is a point in time wherein updates may no longer be merged with the new database because all change records are not available for these updates. Thus, there is no guarantee as to whether these updates have been committed. Updates which cannot be merged with the new database are written to records which are termed “spill records.”
A complete CADS comprises only detail records whereas an incomplete CADS comprises detail and spill records. Creation of an incomplete CADS occurs when multiple database management systems are sharing a database. The majority of database management systems run in a shared session to maximize use of a database. During a shared session incomplete log data sets exist which have updates for periods of time in which all the log records are not available. In a sharing session with multiple database management systems it is not possible to have a complete CADS without taking the database offline and reviewing the log data sets.
Update records of incomplete log data sets cannot be resolved by the change accumulation utility because of the unavailable log records. The change accumulation utility is unable to resolve these update records and does not know if the updates may be applied or not. These update records are written to the spill records. If the relevant log records become available, the update records in the spill records may be read in a subsequent change accumulation process and may be merged with other updates. The change records are incomplete during a shared session because when the change accumulation utility runs the updates are ongoing and some of the change records will be unavailable.
At data base failure, all updates and transactions that are still pending are terminated. If updates are not committed at the time of the data base failure, the related transactions are aborted. Updates are not permanently applied to the database until the updates are committed. During recovery, the recovery utility will determine if an update ends with a commit or an abort. If the update ends with a commit, then the update is applied to the new database. If an abort, the recovery utility rescinds the update.
Recovery of a shared database is a two step process. First, the recovery utility must run a change accumulation process to read the relevant log records and read the incomplete CADS to create a complete CADS. This step is required because the recovery utility is unable to merge the data contained in an incomplete CADS with the new database. Thus, in the art, recovery utilities are not able to directly recover from an incomplete CADS. The incomplete CADS must first be completed. In the second step, the recovery utility applies the backup copy, the complete CADS, and the log data sets and merges these components to create the new data base.
In the recovery process, completing the incomplete CADS may take a long time because it requires reading of all the log data sets that have updates. The recovery process further requires reading the completed CADS, merging their data with the log updates, and restoring from a backup copy and potentially any additional log data sets not contained in the completed CADS. The recovery process may be a very lengthy process and present devastating consequences to users who are in desperate need of a restored database. Furthermore, if a user has a series of data bases and if several of these data bases require recovery, then there may be multiple incomplete CADS which must be completed. Completion of multiple incomplete CADS requires readings of multiple log data sets. Typically each log data set is read sequentially for each incomplete CADS. Thus, a vast amount of data must be read in the recovery process which may be a relatively lengthy process.
Database recovery requires reading each backup copy and each CADS sequentially. Thus, failure of a single database will require time to read each backup copy plus the time to read each CADS and then the time to write the backup copies and merge the CADS with the restored database. This read time is in addition to the time that it takes to complete each incomplete CADS. Furthermore, if multiple databases need to be recovered and the databases have data in a single CADS, the recovery utility reads the CADS once for each database recovery. This potentially could require several reads of the same CADS.
Thus, it would be an advancement in the art to provide a simplified database recovery apparatus and method that substantially reduces recovery time after database failure. The method and apparatus should recover multiple database data sets simultaneously. It would be yet another advancement in the art to provide a database recovery process which eliminates the need to execute a change accumulation process to complete an incomplete CADS to thereby reduce recovery time. It would be a further advancement in the art to eliminate the need to sequentially read each backup copy and CADS for each CADS associated with a database requiring recovery.
Such an advancement is disclosed and claimed herein.