1. Field of the Invention
The present invention relates generally to consolidating updates of database changes and, more specifically, to reducing the number of unmergeable records in a change accumulation data set.
2. Relevant Technology
Reliable management of databases is of paramount importance for modern day society which depends heavily on such databases for storage of critical information. Typically, users require that the database be constantly operational and accessible. Modern day database systems are substantially robust in that they infrequently experience a failure. Nevertheless, when a failure does occur the database recovery must be efficient and accurate to minimize loss to the users. Thus, database recovery is an operation which must be performed expeditiously in order to minimize down time for users. A database experiencing an extensive period of downtime may create an economic disaster.
A database contains database data sets and is managed by a complex database management system. One example of a database management system is the Information Management System (IMS) available from IBM Corp., Armonk, N.Y. The IMS is used extensively to serve a substantial number of databases in operation today. The IMS allows access to one or more databases in order for users to interact with the data maintained on the database. The majority of user access to a database involves transactional operations.
As users update database data sets in the database, the database management system records the updates in a log data set. The log data set is an amount of data, such as a file, which reflects a series of updates to the database. Log data sets are typically recorded in sequential records which have defined start and end points.
Users may make backup copies or a series of backup copies of the database periodically to assist in the recovery of a database. The backup copies may be recorded on tape archives by tape management systems. The backup copy is used as a base to restore the database to its state prior to a database failure. In recovery, subsequent updates to the database are applied from records on the log data sets. Recovery further requires storage of attributes of the database and the backup. Database management systems often include a repository which comprises several attributes of the database and the backup copy. Database management systems use some form of a repository relating to the database and the backup copy to assist in recovery.
Database management systems include a recovery utility to respond to a database failure. Upon database failure, the recovery utility creates a new database and writes the backup copy to the new database. The recovery utility further applies all updates to the database from when the backup copy was last created. Information used to restore the new database from the last state of the backup copy may be taken from the log data sets and recovery control information.
To assist in database recovery a utility, referenced herein as a change accumulation utility, accumulates updates and places them in a change accumulation data set (CADS). The CADS is an accumulation of changes in the log records that apply to the new database and are used as input during database recovery. The CADS may reflect updates for more than one database. A typical database record is updated a portion at a time and there may be overlapping updates which requires a sequential order of recovery. The change accumulation utility receives all the overlapping updates and incorporates the changes and merges overlapping updates.
In order to create the CADS, the change accumulation utility reads log data sets. Typically, users organize their multiple databases into change accumulation groups so that the change accumulation utility operates as efficiently as possible. A user can run the change accumulation process against one change accumulation group and use an optional secondary outputxe2x80x94the set of log records that were not written to the change accumulation data setxe2x80x94as input to the change accumulation utility for the next change accumulation group to be processed. This can be done for each change accumulation group in which the current change accumulation run uses the secondary output of the previous change accumulation run. This serial process is managed directly by the user. Users usually run accumulation periodically so that when a database data set in a change accumulation group requires recovery, the time required to run a final change accumulation job and subsequent database recovery job is minimized. This sequential recovery process is quite complex.
The recovery utility reads the entire CADS into memory and applies that portion of the CADS that is relevant to the database data set being restored. Each record has an identification that""s sequential and the database data sets are restored in a sequential order. The recovery utility addresses each record to see if there is a change in data for that record. If so, the CADS is accessed and the relevant record merged into the new database.
During routine operation, the database management system periodically creates updates in the database and in the log data set. Over time, several updates are created but are not permanently stored in the database until they are physically written on the database. In general, database activity is based on being able to xe2x80x9ccommitxe2x80x9d updates to a database. A commit point is a point in time where updates become permanent parts of the database. The span of time between commit points is referred to as a xe2x80x9ccommit scopexe2x80x9d or xe2x80x9cunit of recoveryxe2x80x9d (UOR). If something goes wrong, such as a write error to the database, and the updates can not be made, all the updates produced since the last commit point are xe2x80x9caborted.xe2x80x9d It is as if the updates never happened.
One method for implementing database updates and commit point processing is for the database manager to maintain the database changes in storage and not apply the changes to the databases until the commit point is reached. A copy of the database data that is changed is written to the log as the update is created. When the commit point is reached, and all operations are as expected, the updates are written to the databases. If an error occurs, the storage containing the database updates is freed.
A common update to the database is termed a transaction which is a unitary logical piece of work that may include performing a variety of activities. At its simplest level a transaction may involve decreasing one account balance and increasing another. The activities performed in the transaction may extend beyond a first commit point and will not be permanent until a subsequent commit point.
The change accumulation utility creates the CADS by taking log data sets that have been committed up to a certain commit point by combining them together. The committed log data sets are readily applied to the new database during recovery because they are permanent. Updates that occur after the last recorded commit point are not readily applied to the new database because there is no guarantee that the updates will be committed at a later commit point. Failure of a commit point results in an abort of the update and any related transactions. If the updates need to be aborted, the log record is retrieved and the copies of the unchanged database data are applied, in effect backing out the changes. Thus, updates that occur after the commit point are not necessarily committed to the database.
Each CADS comprises a detail record which is a record of committed updates from one or more logs. Each detail record is a series of contiguous bytes which can be overlaid into the backup copy of one database physical record. Applying all of the detail records in the CADS is equivalent to rerunning all of the transactions against the data base which were entered since a backup copy was made up to a xe2x80x9cmerge-end point.xe2x80x9d The merge-end point is a point in the log separating mergeable updates from updates which may not be merged into detail record because all change records are not available for these updates. In shared sessions, merge end points are established at the location of sharing session boundaries such as at the end of a completed sharing session.
Updates which cannot be merged are written to records which are termed xe2x80x9cspill records.xe2x80x9d Spill records can only occur in a sharing session when multiple database management systems are sharing a database. The majority of database management systems run in a shared session to maximize use of a database. Spill records contain update data stored in the CADS in their entirety as individual identities and are not as compact as merged detail records. When the relevant log records become available, the spill records may be read in a subsequent change accumulation process and may be merged with other updates. Because updates contained in spill records are not merged, they increase the size of a CADS which in turn increases the amount of time needed to read and process a CADS. Reducing the number of spill records reduces the size of the CADS and improves the processing time of database recovery and subsequent change accumulation processes.
Thus, it would be an advancement in the art to provide a system and method to reduce the number of spill records in a CADS. It would be a further advancement in the art to reduce the number of spill records in a CADS by establishing a merge end point at a later position in commonly shared logs. It would be yet another advancement in the art to reduce the number of spill records by incorporating known features in database systems. Such an invention is disclosed and claimed herein.
The invention establishes a merge end point in the logs of a plurality of database management systems which share a common database. The merge end point is established at a later point to thereby reduce the number of unmergeable records. The system of the present invention comprises a log archive module which determines the location of each log volume end-start point in the logs. A log volume end-start point is the approximate position wherein the medium storing the log records is filled and is switched. Thus, the current medium, such as a tape, ends and a new medium starts at the end-start point. The log archive module assigns a time value to each log volume end-start point to indicate their positions.
The invention further comprises a database recovery control module which receives the positions of the log volume end-start points. The database recovery control module determines the most recent log volume end-start point for each log. The database recovery control module next determines which of the most recent log volume end-start points has the minimum time value. This log volume end-start point is the latest identifiable point wherein all log records for all logs may be merged. This log volume end-start point is selected as the merge end point. Thus, the merge end point need not be selected at the end of a completed sharing session. A change accumulation utility is able to incorporate the merge end point in a CADS to separate updates between those that are merged in the detail and those that are stored in the spill records.
These and other objects, features, and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.