Data processing systems typically require a large amount of data storage. Customer data, or data generated by users within the data processing system, occupies a great portion of this data storage. Effective data processing systems also provide back-up copies of this user data to prevent a loss of such data. Many businesses view any loss of data in their data processing systems as catastrophic, severely impacting the success of the business.
A storage management server provides an effective means for protecting customer data. Generally, a client-server configuration includes several clients connected to a single server. The clients create client files and transfer these files to the server. The server receives the client files and stores them on several attached storage devices. When used as a storage management system, the server manages the back-up, archival, and migration of these client files. By storing the client file on an attached storage device, the server creates a first, or primary, copy of the client file. The server may, in turn, create additional back-up copies of the client file to improve the data availability and data recovery functions of the storage management system. Clients may vary from small personal computer systems to large data processing systems having a host processor connected to several data storage devices. The server can also range from a small personal computer to a large host processor.
An advanced storage management server, such as an IBM ADSTAR Distributed Storage Manager (ADSM), maintains reference information about the client files copied within the attached storage volumes. The server uses a database to keep directory information about the original client files and storage volume location information about the copies of the client files stored within the server. The directory information typically includes a client system identifier, a client system directory, and a client file name. The location information typically consists of a storage volume identifier and an offset within the storage volume among other file attributes. In addition, the server database allows the server to assign an unique identifier to each client file stored within the attached storage volumes. Thus, the server can track individual files throughout the server storage subsystem.
Accordingly, the server database introduces several advantages to the storage management server. The server can cross-reference multiple copies of an individual client file written to different storage volumes. By cross-referencing several copies of the client file, the server improves the data availability to the client systems. For example, if a primary copy of a particular client file is inaccessible because of a destroyed volume or damaged media, the server can access an additional copy residing on a different storage volume and transfer the additional copy to the requesting client system. Further, the server can subsequently recover the unavailable primary copy of the client file from the additional copy. The server needs the storage volume location information provided by the server database to accomplish the above-described data recovery.
By tracking individual client files, the server database also allows the storage management server to perform incremental back-up operations within the server storage. Incremental back-up techniques improve the performance and efficiency of a storage management system. As contrasted to full volume copying which replicates a first, or primary, storage volume to a second, or copy, storage volume, incremental back-up copies only the newly added or updated user files from the primary storage volume to the copy storage volume. Since incremental back-up is performed periodically, the server classifies newly added or updated files as those files added or changed within a primary storage volume since a previous incremental back-up operation was completed. Incremental back-up eliminates the unnecessary copying of files that remain unchanged since the previous back-up operation. As compared to full volume copying, incremental back-up also reduces the number of partially filled copy storage volumes and the number of duplicate files stored on a copy storage volume, thereby reducing the number of copy storage volumes needed within the server.
Additionally, the storage management server can extend these advantages to provide disaster recovery. In disaster recovery systems, a back-up copy of the customer data is kept at a site remote from the primary storage location. If a disaster strikes the primary storage location, the customer data can be recovered from the back-up copies located at the remote site. As a disaster recovery system, the storage management server generates an additional back-up copy of the client file and oversees the transport of this back-up copy to a remote site. The server partitions its storage volumes into resident storage volumes located at the primary storage site and off-site storage volumes located at the remote storage site. The off-site storage volumes typically contain removable media, so that they can be transported to the remote site. The server also determines which client files need to be backed-up within the storage subsystem, how frequent these back-up copies should be made, and which set of storage volumes should be marked as off-site volumes and transported to the remote site. The server, in turn, manages the off-site storage volumes, determining which volumes are needed for disaster recovery. Off-site storage volumes no longer needed for disaster recovery can be reclaimed, returned to the primary site, and reused as resident storage volumes.
The server database improves the storage management server as a disaster recovery system. The server uses the server database to track the individual files either at the primary site or the remote site. The reference location information stored within the server database denotes whether a file copy is stored within a resident volume or an off-site volume. As described earlier, the server uses the server database to perform incremental back-up operations. The server copies newly added files from a first set of resident storage volumes to a second set of volumes and then classifies these volumes as off-site storage volumes to be transported to the remote site. As compared to full volume copying, incremental back-up reduces the number of partially filled off-site storage volumes, the number of duplicate files on the off-site storage volumes, and the number of off-site storage volumes needed.
The server database can only provide the storage management server with the aforementioned benefits if the database accurately reflects the file contents of the server storage volumes. During normal file processing, the server updates the database as each new file copy is generated within the server storage volumes. This ensures that the server database remains consistent with the attached storage volumes. However as with any of the storage volumes within the server, the server database is also susceptible to data loss. Thus, the server periodically creates a back-up of the server database to minimize the effects of a failure within the database storage. The server can then recover the database from the database back-up should a failure occur within the server database. In a disaster recovery system, the server maintains a database back-up along with recovery copies of client files at the remote site. By storing a database back-up on a set of off-site volumes, the server can recover the database should a disaster destroy the database storage at the primary site.
A database back-up, however, introduces new problems to the storage management server. When file locations change after a back-up was completed for the server database, the database back-up is no longer consistent with the storage volumes. The server database reflects the correct reference location information for the files within the server storage, but the database back-up reflects outdated reference location information for the files that moved within the server storage since the database back-up was created. Files added to the server create one type of problem for the database back-up. The database back-up lacks a record of the new file within the server and a pointer to the storage volume location of the new file.
On the other hand, moved or deleted files create a more troublesome problem for the database back-up. The database back-up points to the previous location of the moved file within the server storage. If the server subsequently overwrites the previous location of the moved file with a new file, two problems are created for the database back-up. As stated earlier, the database back-up lacks a record of this new file. If used to recover the server database, the database back-up cannot provide the server with reference location information pertaining to the new file. In addition, the database back-up also loses its reference to the moved file. The database back-up includes a record of the moved file, but instead points to the new file. If used to recover the server database, the database back-up would incorrectly provide the server access to the new file instead of the moved file. Since the database back-up contains no information about the destination of the moved file, references to both the new file and the moved file are lost in the database back-up.
For example, files a, b exist on storage volume A in the server when a back-up is created for the server database. The server then reclaims volume A by copying files a, b to storage volume B. The server updates the server database to point to volume B for the reference location information about files a, b. However, the data pertaining to files a, b is not erased from volume A when the server reclaims the volume. Thus, the database back-up still can access files a, b should the server database be recovered at this point. The server then copies new files c, d from a client system to volume A, overwriting files a, b. Again, the server updates the server database to point to volume A for the reference location information for files c, d. Now, the database back-up incorrectly points to c as reference location information for file a. If the server database were to be recovered from the database back-up at this point, files a, b, c, d would be lost.
Accordingly, an improved method and apparatus are needed in a storage management system to maintain consistency between a database back-up and a set of storage volumes. In particular, the improved method and apparatus should prevent the storage management server from overwriting previously moved or deleted files which are still referenced by the database back-up. In addition, such improved method and apparatus are needed in a disaster recovery system using a storage management server. The particular method and apparatus should maintain consistency between a database back-up at the remote site and a set of storage volumes at either the primary or remote site by preventing the server from overwriting previously moved or deleted files which are still referenced by the database back-up.