1. Technical Field
The present invention relates in general to methods and systems for improved maintenance of backup copies of datasets in a data processing system and in particular to improved methods and systems for transferring backup copies of datasets in a data processing system. Still more particularly, the present invention relates to methods and systems for asynchronous pre-staging of backup copies in a data processing storage subsystem.
2. Description of the Related Art
A modern data processing system must be prepared to recover, not only from corruptions of stored data which occur as a result of noise bursts, software bugs, media defects, and write path errors, but also from global events, such as data processing system power failure. The most common technique of ensuring the continued availability of data within a data processing system is to create one or more copies of selected datasets within a data processing system and store those copies in a nonvolatile environment. This so-called "backup" process occurs within state-of-the-art external storage systems in modern data processing systems.
Backup policies are implemented as a matter of scheduling. Backup policies have a space and time dimension which is exemplified by a range of datasets and by the frequency of backup occurrence. A FULL backup requires the backup of an entire range of a dataset, whether individual portions of that dataset have been updated or not. An INCREMENTAL backup copies only that portion of the dataset which has been updated since a previous backup, either full or incremental. The backup copy thus created represents a consistent view of the data within the dataset as of the time the copy was created.
Of course, those skilled in the art will appreciate that as a result of the process described above, the higher the backup frequency, the more accurately the backup copy will mirror the current state of data within a dataset. In view of the large volumes of data maintained within a typical state-of-the-art data processing system backing up that data is not a trivial operation. Thus, the opportunity cost of backing up data within a dataset may be quite high on a large multi-processing, multi-programming facility, relative to other types of processing.
Applications executed within a central processing system are in either a batch (streamed) or interactive (transactional) mode. In a batch mode, usually one application is executed at a time without interruption. Interactive mode is characterized by interrupt driven multiplicity of applications or transactions.
When a data processing system is in the process of backing up data in a batch mode system, each process, task or application within the data processing system is affected. That is, the processes supporting the batch mode operations are suspended for the duration of the copying. Those skilled in the art will recognize that this event is typically referred to as the "backup window." In contrast to batch mode operations, log based or transaction management applications are processed in the interactive mode. Such transaction management applications eliminate the "backup window" by concurrently updating an on-line dataset and logging the change. However, this type of backup copying results in a consistency described as "fuzzy." That is, the backup copy is not a precise "snapshot" of the state of a dataset/database at a single point in time. Rather, a log comprises an event file requiring further processing against the database.
A co-pending U.S. patent application Ser. No. 07/385,647, filed Jul. 25, 1989, entitled A Computer Based Method for Dataset Copying Using An Incremental Backup Policy, illustrates backup in a batch mode system utilizing a modified incremental policy. A modified incremental policy copies only new data or data updates since the last backup. It should be noted that execution of applications within the data processing system are suspended during copying in this system.
As described above to establish a prior point of consistency in a log based system, it is necessary to "repeat history" by replaying the log from the last check point over the datasets or database of interest. The distinction between batch mode and log based backup is that the backup copy is consistent and speaks as of the time of its last recordation, whereas the log and database mode require further processing in the event of a fault in order to exhibit a point in time consistency.
U.S. Pat. No. 4,507,751, Gawlick et al., entitled Method and Apparatus For Logging Journal Data Using A Write Ahead Dataset, issued Mar. 25, 1985, exemplifies a transaction management system wherein all transactions are recorded on a log on a write-ahead dataset basis. As described within this patent, a unit of work is first recorded on the backup medium (log) and then written to its external storage address.
Co-pending U.S. patent application Ser. No. 07/524,206, filed May 16, 1990, entitled Method and Apparatus for Executing Critical Disk Access Commands, teaches the performance of media maintenance on selected portions of a tracked cyclic operable magnetic media concurrent with active access to other portions of the storage media. The method described therein requires the phased movement of customer data between a target track to an alternate track, diversion of all concurrent access request to the alternate track or tracks and the completion of maintenance and copy back from the alternate to the target track.
Requests and interrupts which occur prior to executing track-to-track customer data retirement result in the restarting of the process. Otherwise, requests and interrupts occurring during execution of the data movement view a DEVICE BUSY state. This typically causes a re-queuing of the request.
The cross-referenced patents applications set forth herein describe a so-called "time zero" backup copy system wherein execution of an application is suspended for a minimum period of time for purposes of creating a backup copy. In such a system, backup copies are created by first creating a dataset logical-to-physical storage system address concordance for designated datasets and thereafter resuming execution of the application. Formation of the backup copy is then accomplished on a scheduled or opportunistic basis by copying the designated datasets from the storage subsystems to the host and then updating the address concordance in response to such copying. Application driven updates to uncopied designated datasets are processed by first buffering those updates, copying the affected uncopied designated datasets to a storage subsystem memory, updating the address concordance in response to that copying and then processing the updates. In this manner, execution of an application is suspended for a minimal period of time necessary to create the logical-to-physical storage system address concordance and copies of portions of the designated dataset are only created for those portions which are updated prior to copying.
While this time zero backup copy system represents a substantial improvement over the prior art, the transfer of large amounts of data from a storage subsystem to a host system for backup copying purposes still requires a substantial commitment of system assets. This is especially true during a transfer of designated portions of the dataset from a storage subsystem to a host system which is limited by the speed of the storage device. Typically, the transfer rate of data from a storage device is 4.2 megabytes per second. The data channels utilized to couple storage subsystems to host systems now commonly utilize serial optical data technology which is capable of transferring data at up to 17 megabytes per second. Thus, transfer of data between a storage subsystem device and a host system for backup copy purposes would be greatly enhanced if that data can be transferred from memory within a storage system controller to the host system directly. Certain state-of-the-art storage system control units permit sequential processing of data; however, no provision is made in such systems for discerning the order in which the host system desires to transfer the data and such systems provide no technique whereby the pre-staging of data within storage system memory may be coordinated with attempts by the host system to read that data.
Thus, it should be apparent that a need exists for a method and system which permits the asynchronous pre-staging of backup copy data in a data processing storage subsystem in a manner which greatly enhances the transfer of backup copy data to a host system.