Data processing systems typically require a large amount of data storage. Customer data, or data generated by users within the data processing system, occupies a great portion of this data storage. Effective data processing systems also provide back-up copies of this user data to prevent a loss of such data. Many businesses view any loss of data in their data processing systems as catastrophic, severely impacting the success of the business.
Initially, back-up systems simply copied data from DASD to magnetic tape at periodic time intervals, such as a daily or weekly basis. The magnetic tape was then manually moved to a secure storage area, usually located kilometers from the location of the data processing system. This initial back-up method had several shortcomings: retrieval of the back-up copy was time consuming, on the order of hours or even days; all primary files in the data processing system were copied to the back-up media regardless of whether they were updated since the last back-up; and, system performance suffered since files could not be updated during the back-up process. A slight improvement to this initial back-up method periodically transmitted the primary data files to a back-up location. Although this method improved the time to retrieve the back-up data, it still required the primary files to be rewritten each time a back-up occurred and it still prevented further updates to be made to the primary files during the back-up process.
An alternative to these initial back-up methods involves data shadowing, or data mirroring. Dual copy and remote dual copy typically provide this back-up technique. In dual copy, additional storage devices are provided in the data processing system such that an additional copy of the primary data file is written to an additional storage device. Storage devices are coupled together to form duplex pairs, each duplex pair consisting of a primary and secondary storage device. When data is written to the primary storage device, the data processing system automatically copies the data to the secondary storage device. Thus, the secondary storage device is an exact physical image, or mirror, of the primary storage device. With dual copy, a movement of a data file from a first primary storage device to a second primary storage device requires that the back-up copy of the data file also be moved from a first to a second secondary storage device. Because dual copy relies on physical data mirroring, the secondary storage device must be the same physical geometry as the primary storage device, configured and formatted to be an exact replica.
Remote dual copy extends the dual copy methodology to disaster recovery systems. In this configuration, the primary and secondary storage devices are located at different storage subsystem sites remote from each other. The primary and secondary storage devices again form duplex pairs with the secondary storage device mirroring the primary storage device. Again, the primary and secondary storage devices must be the same physical geometry with the secondary storage device configured and formatted to be an exact replica of the primary storage device. Remote dual copy falls into two general categories, synchronous and asynchronous. Synchronous remote dual copy involves sending primary data to the secondary location and confirming the reception of such data before completing the current input/output (I/O) operation. That is, a subsequent I/O operation at the primary site cannot start until the primary data has been successfully copied to the secondary storage device. On the other hand, asynchronous remote dual copy completes the I/O operation to the primary storage device before the data is copied to the secondary storage device. That is, a subsequent I/O operation at the primary site can begin before the primary data from the previous I/O operation has been copied to the secondary site.
Client-server environments have also been developed within data processing systems to serve many purposes. Generally, a client-server configuration includes several clients connected to a single server. The clients create client files and transfer these files to the server. The server receives the client files and stores them on several attached storage devices. When used as a storage management system, the server manages the back-up, archival, and migration of these client files. By storing the client file on an attached storage device, the server creates a back-up copy of the client file. Clients may vary from small personal computer systems to large data processing systems having a host processor connected to several data storage devices. The server can also range from a small personal computer to a large host processor.
Data availability, however, has become such a critical measure of data processing systems that providing only a single copy of the client file within the server system may no longer be sufficient. Several types of errors can occur within the server system to prevent access to the client files. Media failures can damage the client file making it unavailable to the client system. In addition, an entire volume containing several client files may be destroyed within the server system. Finally, a disaster may wipe out the entire server system. All of these failures prevent the client system from accessing the back-up copy provided by the server system. Without an additional back-up copy, the integrity of the server system as protection against the loss of client files becomes inadequate should the server system suffer a catastrophic failure.
To substantially reduce the risk of data loss, server systems typically create a second back-up copy within the server storage subsystem. Some current server systems create an additional back-up by requesting the client system to resend a set of client files. The server then catalogs the second set of client files. While it simplifies the server's duties in maintaining the additional back-up copies, this method poses some disadvantages. The server uses a greater amount of the network resources between the client systems and the server. The client system transfers twice as much data to the server to maintain back-up protection for a single file. In addition, the client system cannot access the original client file while it transfers the file to the server. Thus, requiring the client system to resend the client file to the server doubles the amount of time the file is unavailable to the user in the client system.
Alternatively, some current server systems create an additional back-up by copying the contents of a first storage volume to a second storage volume. The first storage volume contains the initial back-up copies of the client file provided to the server system from the client system. By copying the entire first volume to a second volume, the second volume becomes a duplicate of the first volume and the server system now contains a second back-up copy of the client files. Essentially, the server system is making back-up copies of its primary data files, wherein these primary data files are the primary copies of the client files sent to the server from the client systems.
Making back-up copies on a storage volume boundary creates disadvantages in these current server systems. Device geometries become important when the server system makes its subsequent back-up copies by replicating storage volumes. Creating back-up copies in this manner typically requires identical storage devices for storing the back-up volume. That is, the storage device used for storing the subsequent back-up copies in the server system must be the same type and formatted in the same manner as the storage device which stores the primary copies of the client files. Furthermore, creating subsequent back-up copies in the server system by replicating storage volumes can propagate inefficient use of storage volumes. That is, a first storage volume containing initial client files that is only partially full will be duplicated in a second storage volume. This doubles the amount of unused, and unavailable, space within the server system.
Full volume copying as a means for back-up also creates additional performance inefficiencies in the server. This method often requires the server to duplicate unmodified files from one storage volume to another. For example, a first storage volume containing X files receives an additional file. Full volume copying now requires that all files residing on the first volume, X+l files, be copied to a second storage volume. Thus, the unchanged files, X files, must be duplicated to make the second storage volume an exact replica of the first storage volume. In addition, files are sometimes moved from one storage volume to another to compress data within the second volume. Full volume copying as a back-up means requires that the second volume must be duplicated once the data compression is complete.
As stated previously, the server system contains a storage subsystem for storing one or more back-up copies of client files. The storage subsystem within the server system need not be limited to a single type of storage device. In fact, server storage subsystems often comprise several different types of storage devices: DASD, optical disk, or magnetic tape. When a server is coupled to different types of storage devices, the storage subsystem is often categorized according to a storage hierarchy. The hierarchy may be based on several factors including: the access speed of the storage device, the density of storage on the device, and the cost of the device per storage unit (i.e. cost per megabyte). A server system containing different storage devices grouped in a storage hierarchy uses a computer application program called a hierarchical storage manager to maintain data files within the storage hierarchy. The storage manager migrates data files between the various levels of the data storage hierarchy using storage management techniques. The storage manager also maintains a reference list, or index, of the data files stored within the server system to assist it in managing the repository of client files within the storage hierarchy.
This storage hierarchy within the server system poses additional problems for servers that generate additional back-up copies by replicating storage volumes. As stated previously, this back-up technique depends on device geometries. Since the additional back-up storage volume is a mirror image of the first storage volume in the server, migrating the primary copy of the client file from one storage device type to a different storage device type within the storage hierarchy necessitates that the back-up copy must also be recopied. That is, full volume copying requires that two storage volumes be replicated for every one file that is migrated within the storage hierarchy. The first storage volume must be duplicates since it now contains one less file. Additionally, the second storage volume must also be duplicated since it contains an additional file. This scenario makes for an inefficient back-up storage management scheme within the storage hierarchy of the server system.
Accordingly, an improved method and apparatus is needed within a data processing system using a client-server configuration for generating and managing additional back-up copies of client files sent from the client systems to the server. Such method and apparatus should incrementally back-up storage volumes within the server storage subsystem instead of replicating the entire storage volume. In an incremental back-up scheme, only the client files from the first storage volume that are new or updated since the previous additional back-up was completed should be copied to the second storage volume. In addition, such method and apparatus should provide direct access to the client system of the additional back-up copy of a requested client file when the primary copy of the file is unavailable within the server storage subsystem. In turn, such method and apparatus should also recover the primary copies of the client files from the additional copies of the files when the server system determines that the storage volume containing the primary copies has been destroyed.