1. Field Of The Invention
The present invention is directed generally to data storage in a computerized data processing system, and more particularly, to the recovery of critical data following an unexpected disaster event resulting in data unavailability or loss.
2. Description Of The Related Art
When an unexpected disaster resulting in data unavailability or loss strikes a data processing system, recovery procedures must be implemented to restore the system to a pre-disaster useable state. This is typically accomplished by recovering data that was previously backed up prior to the disaster. The backed-up data is generally stored in one or more secondary or tertiary storage systems that are physically separate from the primary data storage systems and immune from the disaster event affecting the primary systems. It is the job of the system administrator and/or the applications user to periodically back up all necessary data required by the data processing enterprise, including operating system programs and data, catalogs, directories, inventories, user programs and user data. These data may be backed up to one or more magnetic tapes or tape cartridges, magnetic disks, or optical disks which may have ablative, phase-change, magneto-optic or any other optical recording layers thereon. These backup data storage media may be housed in one or more automated data storage libraries having a plurality of storage cells containing such media, one or more drives to transfer data to and from the media, and automated picker/gripper mechanisms to physically transport the media between their individual storage cells and the one or more drives.
Disaster recovery in a data processing system often represents a critical time challenge to recover enough data to initiate the critical applications needed to run a data processing enterprise as soon as possible after a disaster has been declared. This may be needed after the loss of an entire data center, or just a significant portion of the data. The conventional method involves recovering all data to a base level, followed by, if necessary, forward recovery processing to prepare the data for use. For data sets maintained as a series of integral files preparation may involve re-submitting transactions or data updates that were recorded after the base level backup, up to and including the synchronization point, i.e., the time at which the disaster occurred. Another method for maintaining data sets is known as Changed Data Only (CDO) Recovery. This method was developed by the assignee of the present application. According to the CDO recovery method, an entire data set is initially backed up. Subsequent backup versions of the data set consist of only the changed portions of the data set representing subset units of data (data subsets) actually written by applications subsequent to the initial backup. For data sets maintained by CDO sessions, data set preparation following a complete data set loss (i.e., the current version of the data set is no longer valid) consists of restoring the base level backup and then applying the data subset changes that occurred between the base level backup and the desired recovery version and then using this reconstructed version to complete the recovery. This is known as a full backup mode. For disasters where the current data set remains structurally valid, CDO recovery to a given version can be performed by merely restoring the data subsets that have changed since the desired version. This is known as the change-data-only recovery mode.
Studies have shown that for a significant portion of data, the base level backup is an exact match for the synchronization point, and that only a small position of the data requires the application of changes. A problem occurs, however, because the systems manager is unable to predict which data will be needed by which application following the disaster. This compels the systems manager to recover all data that existed at the time of the disaster. For example, a small mainframe data processing center having 500 GB of data, this could take as much as 72 hours. During that time, all or a portion of the applications running on the data processing system will be inoperable and/or unable to access their data. For larger centers the recovery process will take even longer, and the risk of financial losses will become more substantial.
Accordingly, one cannot rely on conventional recovery processes when immediate access to data is necessary following a disaster. An improved method is needed to ensure that applications requiring data do not have to unnecessarily wait to gain access to their data. What is required is an efficient method for allowing a critical application to obtain its data on a timely basis without otherwise disrupting the recovery process.