Data processing systems typically require a large amount of data storage. Customer data, or data generated by users within the data processing system, occupies a great portion of this data storage. Effective data storage systems often store data on removable media, such as magnetic tape or optical disks, since removable media provides a lower cost per unit of storage.
Removable media is useful for data that must be stored for long periods of time, data that is infrequently accessed, or data that is backed-up or migrated from more accessible storage media, such as electronic memory or direct access storage devices (DASDs).
Data storage systems often utilize separate databases, or catalogs, to maintain reference information, directory information, and storage location information about the customer data files stored within the system. The reference, or directory, information typically includes a customer identifier, a file directory, and/or a file name. The storage location information typically contains a storage volume identifier and a corresponding offset, thereby defining the location of the customer data file within the storage system. In addition, the database, or catalog, can assign an unique identifier to each customer data file, allowing a storage system controller to track the individual files through the data storage system.
The database, or catalog, is critical to accessing the customer data files with the data storage system, as it provides pointers to the corresponding storage location for each file. The database is typically stored separately within the data storage system, in a faster-access storage medium, such as a non-volatile, electronic memory or a DASD. The database can only provide access to the customer data files if the database is consistent with the set of storage volumes within the data storage system. That is, the database accurately reflects the file contents of the storage volumes. During normal processing, the database is updated as each new data file is stored within the storage system. This ensures that the database remains consistent with the attached storage volumes. The database storage, however, is also susceptible to failures or errors, potentially resulting in data loss. Thus, data storage systems require a back-up of the database to minimize the effects of a failure within the database storage. Normally, an operator causes a database back-up to be taken. Some advanced data storage systems periodically create a back-up copy of the database. A storage manager, or storage controller, within the system can then recover the database from the database back-up should such a failure occur to the database storage media.
A back-up copy of the database can potentially introduce new problems to the data storage system. Backing-up the database to a single storage volume, or even a set of storage volumes, may create storage problems within the data storage system. The database back-up storage volume must be identified within the storage system. Placing a record within the database pointing to the database back-up volume does not eliminate this problem, since the database storage media could still suffer a failure resulting in the loss of the reference to the database back-up. Additionally, if the database back-up reference is not lost, the storage volume containing the database back-up could be damaged or destroyed. Damage resulting from water, fire, or some other natural disaster often destroy a subset of the storage volumes within the data storage systems. If the database back-up happens to be one of those volumes destroyed, database recovery, as well as the recovery of customer data, may be impossible.
A periodic back-up of the database also creates problems within the data storage system. When storage locations of customer data files change after a database back-up was completed, the database back-up is no longer consistent with the storage volumes. The database reflects the correct storage location information for the data files within the storage system, but the database back-up reflects outdated storage location information for the files that moved within the storage system since the database back-up was created. Data files added to the storage system create one type of problem for the database back-up. The database back-up lacks a record, or identifier, for the new file within the storage system and a pointer to the storage volume location of the new file.
On the other hand, moved or deleted data files create a more troublesome problem for the database back-up. The database back-up points to the previous location of the moved file within the storage system. If a new file is subsequently overwritten in the previous storage location of the moved file, two problems are created for the database back-up. As stated earlier, the database back-up lacks a record of the new file. If used to recover the database, the database back-up cannot provide the storage manager with a storage location pertaining to the new file. In addition, the database back-up also loses a pointer to the storage location of the moved file. The database back-up includes a reference to the moved file, but instead points to the previous storage location of the moved file, the same location now overwritten by the new file. If used to recover the database, the database back-up would incorrectly provide the storage manager access to the new file instead of the moved file. Since the database back-up contains no information about the destination of the moved file, references to both the new file and the moved file are lost in the database back-up.
For example, files a, b exist on storage volume A in the storage system when a back-up is created for the database. The storage manager then reclaims volume A by copying files a, b to storage volume B. The database is updated to point to volume B for the storage location for files a, b. The data pertaining to files a, b, however, is not erased from volume A when it is reclaimed. Thus, the database back-up still can access files a, b should the database be recovered at this point. The storage manager then copies new files c, d to volume A, overwriting files a, b. Again, the database is updated to point to volume A for the storage location for files c, d. Now, the database back-up incorrectly points to c as a storage location for file a. If the database were to be recovered from the database back-up at this point, files a, b, c, d would be lost.
Accordingly, an improved method and apparatus are needed in a data storage system to maintain consistency between a database back-up and a set of storage volumes. In particular, the improved method and apparatus should ensure that the database back-up is not lost when any of the storage volumes are damaged or destroyed. In addition, the improved method and apparatus should ensure that the database back-up remains consistent with a set of storage volumes within the data storage system. That is, the improved method and apparatus should maintain correct storage location references in the database back-up when customer data files are newly written to or moved within a set of storage volumes.