1. Field of the Invention
The present invention relates generally to disaster recovery in data processing systems. More particularly, the present invention relates to maintaining and employing shadow copies of a database for remote site disaster recovery.
2. Description of the Related Art
Data processing systems typically require a large amount of data storage. Customer data, or data generated by users within the data processing system, usually occupy a great portion of this data storage. Effective data processing systems also provide back-up copies of this user data to insure against a loss of such data. For most businesses, any loss of data in their data processing systems is catastrophic, severely impacting the success of the business. To further protect customer data, some data processing systems extend the practice of making back-up recovery copies to provide disaster recovery. In disaster recovery systems, a recovery copy of the customer data is kept at a site remote from the primary storage location. If a disaster strikes the primary storage location, the customer data can be retrieved or "recovered" from the recovery copies located at the remote site.
Several methods are known for providing disaster protection using mirror copies of the primary storage data at a remote storage site. Remote dual copy, or remote data duplexing, is one form of this data mirroring solution. In remote dual copy, additional storage devices are provided in the data processing system such that an additional copy of the primary data is written to a recovery storage device. Storage devices are coupled together to form duplex pairs, each duplex pair consisting of a primary and recovery storage device. The primary storage device is located at the primary storage location, while the recovery storage device is located at the remote site. When data is written to the primary storage device, the data processing system automatically copies the data to the recovery site.
Full volume copying is an alternate method for providing disaster recovery of a database. Full volume copying may use a storage management server to generate recovery storage volumes from the primary storage volumes. Commonly, a client-server configuration includes several clients connected to a single server. The clients create client files and transfer these files to the server. The server receives the client files and stores them on several attached storage devices. When used as a storage management system, the server manages the back-up, archival, and migration of these client files. By storing the client file on an attached storage device, the server creates a first back-up, or primary, copy of the client file. The server may, in turn, create additional back-up copies of the client file to improve the data availability and data recovery functions of the storage management system. Clients may vary from small personal computer systems to large data processing systems having a host processor connected to several data storage devices. The server can also range from a small personal computer to a large host processor.
To provide disaster recovery, the storage management server must generate a recovery copy of the client file and oversee the transmission of this recovery copy to a remote site. As a disaster recovery system, the server partitions the storage subsystem into a set of primary storage volumes and a set of remote, or off-site, recovery storage volumes. The off-site recovery volumes may contain removable media, so that they can be transported to the remote site. These volumes may be formatted using the same format or a different format from that used by the primary storage volumes for storing data and commands.
The server determines which client files need to be backed-up within the storage subsystem, how frequently these back-up copies should be made, or which set of the volumes should be transported to the remote site. The server or a separate controller may manage the off-site recovery storage volumes and determine which volumes are needed for disaster recovery. Off-site storage volumes no longer needed for disaster recovery may be reclaimed and reused. The server typically coordinates the reclamation and reuse of the recovery storage volumes. Successful reclamation and reuse of recovery volumes no longer needed for disaster recovery substantially improves the efficiency and performance of a disaster recovery system.
Incremental back-up techniques have evolved to improve the efficiency of disaster recovery systems. Using these techniques, only the user files new to the primary storage volume are copied to the recovery volumes since the last periodic back-up operation was completed. Thus, incremental back-up eliminates the unnecessary copying of files that remain unchanged since the previous back-up operation. As compared to full volume copying, incremental back-up reduces the number of partially filled storage volumes at the remote site. It also reduces the amount of duplicate files and duplicate volumes stored at the remote site, thereby simplifying the management of off-site recovery storage volumes.
Problems still exist in the management of the off-site storage volumes even when incremental back-up techniques are used. As outdated primary copies of client files are expired from the server, the corresponding recovery copies are no longer needed at the remote site. In turn, the amount of relevant space--space occupied by recovery copies needed for disaster recovery--decreases on the off-site storage volumes. When the reclamation threshold is reached on a particular off-site storage volume, the server reclaims the recovery volume by copying the remaining files to an alternate storage volume. However, the server may not be able to mount the volume to be reclaimed since it is located off-site. Moreover, the server may not be able to return the file to the primary storage site for mounting since a disaster may have destroyed the primary site. Further, disaster protection may be lost if the off-site volume is moved to the primary storage site before the volume has been reclaimed and then a disaster destroys the primary site.
What is needed is a tracker database management system (DBMS) used to maintain shadow copies of primary database data at a remote recovery site. The tracker DBMS system should allow fast remote site takeover when a disaster occurs at the primary database site. Further, updates to the remote site shadow copies should be synchronized by use of a restart procedure at the remote site. The restart procedure should apply updates using a multi-phase restart procedure at the remote site when the primary site fails.