1. Field of the Invention
The present invention concerns the recovery of multi-volume data sets that are stored in multiple direct access storage devices ("DASDs"). More specifically, the invention provides a method and apparatus useful in recovering multi-volume data sets from one or more DASDs that have failed, while providing a substantial level of automation, data presentation, and improved performance.
2. Description of Related Art
The primary non-volatile storage device used in today's computers is the DASD. DASDs include a number of different memory storage devices, such as magnetic disk storage devices ("hard drives"), optical data storage disks, and other devices that permit the computer to "directly access" the storage media. Typically, each DASD associated with a computer contains a "volume" of data. Since most mainframe computers typically need more storage space than a single volume can provide, most mainframes have access to multiple volumes (via multiple DASDs).
A "data set" is a collection of information, such as a "file". Typically, the information of a data set is contiguous on a single volume, in that the information constitutes a continuous stream of bits from the beginning of the data set to its end. Despite these characteristics, computers with access to multiple volumes are able to advantageously store data sets by "striping" them across two or more volumes. Particularly, these computers break a single data set into segments, and store each segment on a different volume. This is somewhat like cutting a continuous ticker tape into various sections, and filing each section in a different filing drawer. Data sets stored in this manner are called "multi-volume" data sets because they reside on multiple volumes. FIG. 1 illustrates an example, where there are four volumes 100-103 of data. Data set "A" resides on volume 100. Data set "B" is a multi-volume data set that includes the components "B1" and "B2", residing on volumes 101-102. Data set "C", a multi-volume data set comprising components C1-C4, resides on volumes 100-103.
As shown in FIG. 1, the data sets stored on the volumes 100-103 may be "backed up" on another media, such as magnetic tape. In the present example, data set "C" (comprised of segments C1-C4) exists on a tape backup 106, data set "B" (comprised of segments B1-B2) exists on a tape backup 107, and data set "A" exists on a tape backup 108.
A problem arises if, for example, volume 101 fails. Such a failure may result from a number of known causes, such as a "head crash." Prior to its failure, volume 101 included segments of the "B" and "C" data sets. To restore "B1" to volume 101, most data recovery systems require the restoration of the entire "B" data set; likewise, to restore "C3" to volume 101, the entire data set "C" must be restored. Typically, recovery in this manner is required because the tape backup stores each data set as a continuous whole, without providing separate access to the individual segments. In some circumstance, restoration of an entire data set may be undesirable. On one hand, restoring "B1" and "C3" to volume 101 is an improvement, since volume 101 has failed and would otherwise be blank. However, the copies of "C1", "C2", "C4", and "B2" on volumes 100 and 102-103 may have been changed after the last backup was made. If this is the case, then restoring the multi-volume data sets "B" and "C" would destroy several recently changed data segments, which were stored after the last backups of "B" and "C".
The scheme described above differs from Redundant Array of Independent Disks ("RAID") implementations. RAID systems are capable of recovering data from their DASDs by recording redundant parity within a fixed cell of shared disks. RAID systems, however, are limited by a maximum number of volumes that can be effectively recovered, such as one or two volumes. Furthermore, RAID configurations reconstruct data using parity, rather than recovering data from separate backup media, such as magnetic tape.