The present invention relates generally to a system and method for synchronizing a remote data copy and, more particularly, to a system and method for efficient snapshot synchronization of a data copy using at least one accumulation remote copy trio consistency group including a source volume, accumulated write commands, an established peer-to-peer remote copy (PPRC) volume pair, and data consistency models.
With increasingly large amounts of data being handled in data processing systems, storage systems, such as disk or tape storage systems, are being used to store data. Some organizations rely heavily on data and quick access to the data. Disasters caused by environmental conditions, user errors, or application errors may occur in which access to the data is lost for some period of time. Mirroring or copying data to a secondary storage system from a primary storage system is currently employed for recovery purposes to minimize the time in which access to data is lost due to such a disaster.
In that regard, peer-to-peer remote copy (PPRC) is a synchronous copy mechanism that creates a copy of data at a remote or secondary storage system. The copy at the secondary storage is kept current with the data located at the primary storage system. In other words, a copy of the data located at the secondary storage system is kept in synch with the data at the primary storage system as observed by the user of the data. Volume pairs are designated in which a volume in the primary storage system is paired with a volume in the secondary storage system. Data transfer occurs in pairs in which data is transferred from a volume in a primary storage system to a corresponding volume in a secondary storage system, which together may be referred to as an established PPRC pair.
With a PPRC system, a data copy made to the secondary, or xe2x80x9crecovery,xe2x80x9d storage system occurs synchronously from a host point of view with write operations to volumes in the primary storage system. When data is written to the primary storage system, the data written to a particular volume is also written to a corresponding volume in the secondary storage system using a path to the secondary storage system.
Effecting a data copy from a primary volume to a secondary volume in a PPRC system may include an internal snapshot copying mechanism for copying all of the data of a source storage volume to a primary storage volume in a PPRC pair, which data is then migrated to the corresponding secondary storage volume. The internal snapshot copying mechanism makes a copy of at least one pointer to the data of a source volume, and the primary volume in the PPRC pair uses the pointers to access the data. The internal snapshot copying mechanism makes a copy of the data of a source volume to a primary volume of a PPRC pair by using pointers to the data in the source volume and then storing the pointers in a map. By using pointers, the internal snapshot mechanism can quickly copy the data from the source volume without affecting the access of a host to the source volume. The primary volume then transfers the data to the corresponding secondary volume without any host access interruption to the source volume.
There is, however, a large time difference between snapshot copies and synchronizing remote copies. That is, if a user wants to migrate a point-in-time copy of data from a source volume to a secondary volume in a PPRC pair by snapshot copying the source volume to the primary volume in the PPRC pair, the entire source volume is sent to the secondary volume, an operation which can take a very long time.
Further, as the number of PPRC volume pairs that are established and attempt to move from a duplex pending to a duplex state increases, system resources become increasingly degraded as duplex pending pairs are added. Cache space, processor cycles, and data paths are consumed while duplex pending. A duplex pending pair is a pair of corresponding volume pairs in which the system is attempting to copy the primary storage volume to the secondary storage volume. A duplex state pair is a pair of corresponding volume pairs in which the data from the primary storage volume has been copied to the secondary storage volume. Moreover, individual primary volume performance may be additionally affected because the host has to compete with the synchronizing task for access to the source volume. The synchronizing task is the process of migrating the source storage volume to the secondary storage volume.
Thus, there is needed an improved system and method for synchronizing a data copy. In such a system and method, when a user wants to snapshot copy from a simplex source volume to a PPRC volume pair in order to migrate backup data to a secondary system, rather than sending the entire source volume to the secondary volume, only data indicated by accumulated write commands would be sent. In order to make the snapshot copying and the migration of the data efficient, a bitmap would be used to signify the accumulated write commands. Advantageously, only the data indicated by those write commands would be snapshot copied and migrated to the secondary.
Such a system and method would preferably employ a group of three storage volumes in this operation. The first would be the source volume of the snapshot copy, which would accumulate the write commands in a bitmap. The next would be the target volume of the snapshot copy, which would be a primary volume of a PPRC pair and would receive the results of the write commands and a copy of the bitmap having the accumulated write commands. The final volume would be on the secondary system and would be the secondary volume of the PPRC pair. The three volumes would together comprise an accumulation remote copy trio. The source volume could be specified along with the establishment of the primary target-secondary PPRC pair, or configured through an operations panel. Upon establishment of the PPRC pair, an internal snapshot copy would synchronize the source and primary target volumes. The primary target volume would begin synchronization with the secondary volume by sending over the entire volume. The source volume would establish a bitmap and begin accumulating write commands received from a host. Subsequent snapshot copies from the source volume to the primary target volume would only snapshot copy data indicated by the accumulated write commands. Only that data indicated by the accumulated write commands would then be migrated to the secondary volume.
Such a system and method would thereby allow a user to make a point-in-time copy of data and very efficiently migrate that copy to a secondary system without impacting the source volume. The target of the snapshot copy would be the primary of a PPRC pair that would transfer only the tracks specified in the bitmap to the secondary volume. In such a fashion, the PPRC pair would become duplex much more efficiently because only the specified tracks in the bitmap would be sent to the secondary volume. The bitmap could represent granularity at a record, track or cylinder level. Such a system and method would thereby remove host impact to the source volume while data is being migrated to the secondary volume. As a result, very little response time degradation would be seen by the host. Still further, the more efficient migration of data to the secondary volume would reduce the time it takes to synchronize the volumes, consume less system resources, and reduce the time interval between potential snapshot copies for migration purposes.
Such a system and method could also be employed in storage systems that comprise multiple source volumes, each associated with an established PPRC pair. In that regard, users increasingly have databases that span multiple source volumes, and would like to migrate data to a secondary storage system also having multiple volumes for disaster recovery purposes as described above. Such migration should be as quick as possible to facilitate smaller incremental backups. In such systems, however, data consistency becomes more important. That is, for databases spanning multiple source volumes, users need multiple consistent secondary volumes. Thus, users are becoming more interested in creating consistency groups of several source volumes, and then making point-in-time copies of such groups for disaster recovery purposes.
While the above described system and method for synchronizing a data copy comprising at least one accumulation remote copy trio provide a mechanism for facilitating smaller incremental backups, there remains a need for consistency management using accumulation remote copy trios. Such a consistency group system and method would preferably support either xe2x80x9cweakxe2x80x9d or xe2x80x9cstrongxe2x80x9d consistency models, as desired by a user. Such a system and method would also preferably allow a user to set up a particular consistency group by specifying a list of source volumes, as well as to select the type of consistency desired. Thereafter, such a system and method would automatically provide for and control the type of consistency selected by the user, without the need for user intervention or control of backup operations.
Accordingly, it is an object of the present invention to provide an improved system and method for synchronizing a data copy.
According to the present invention, then, a system is provided for synchronizing a data copy that comprises first and second remote copy trios. The first and second remote copy trios each comprise a source storage volume, a target storage volume associated with the source storage volume, and a secondary storage volume associated with the target storage volume. The source volume is provided for storing an initial data file, executing a plurality of write commands from a host to generate an updated data file, and generating a record of the write commands. The target volume is provided for receiving a copy of the initial data file and a copy of the write command record from the associated source volume, and transmitting the copy of the initial data file and data indicated by the write command record to the associated secondary volume. The secondary volume is provided for storing the copy of the initial data file and the data indicated by the write command record received from the associated target volume, wherein the copy of the initial data file and the data indicated by write command record stored on the secondary volume are available for use in generating a copy of the updated data file. According to one embodiment of the system of the present invention, the target volumes transmit to the associated secondary volumes in series relative to each other so that consistency is maintained at all times across the source volumes. According to another embodiment of system of the present invention, the target volumes transmit to the associated secondary volumes in parallel relative to each other so that consistency across the source volumes is achieved when all target volumes have completed transmitting to the associated secondary volumes.
Still further according to the present invention, a method is also provided for synchronizing a data copy. The method comprises providing first and second remote copy trios, each comprising a source storage volume, a target storage volume associated with the source storage volume, and a secondary storage volume associated with the target storage volume. The source volume is provided for storing an initial data file, executing a plurality of write commands from a host to generate an updated data file, and generating a record of the write commands. The target volume is provided for receiving a copy of the initial data file and a copy of the write command record from the associated source volume, and transmitting the copy of the initial data file and data indicated by the write command record to the associated secondary volume. The secondary volume is provided for storing the copy of the initial data file and the data indicated by the write command record received from the associated target volume, wherein the copy of the initial data file and the data indicated by write command record stored on the secondary volume are available for use in generating a copy of the updated data file. According to one embodiment of the method of the present invention, the target volumes transmit to the associated secondary volumes in series relative to each other so that consistency is maintained at all times across the source volumes. According to another embodiment of the method of the present invention, the target volumes transmit to the associated secondary volumes in parallel relative to each other so that consistency across the source volumes is achieved when all target volumes have completed transmitting to the associated secondary volumes.
According to the present invention, a system is also provided for synchronizing a data copy, the system comprising a source storage volume for storing an initial data file, executing a write command from a host to generate an updated data file, and generating a record of the write command. The system also comprises a target storage volume for receiving a copy of the initial data file and a copy of the write command record from the source volume, and a secondary volume for storing a copy of the initial data file. The target volume is further for transmitting the copy of the initial data file to the secondary volume, and transmitting data indicated by the write command record to the secondary volume so that consistency is maintained at all times for the source volume. The secondary volume is further for receiving and storing the data indicated by the write command record, wherein the copy of the initial data file and the data indicated by write command record stored on the secondary volume are available for use in generating a copy of the updated data file.
Still further according to the present invention, a method is also provided comprising providing a source storage volume for storing an initial data file, executing a write command from a host to generate an updated data file, and generating a record of the write command. The method further comprises providing a target storage volume for receiving a copy of the initial data file and a copy of the write command record from the source volume, and providing a secondary volume for storing a copy of the initial data file. The target volume is further for transmitting the copy of the initial data file to the secondary volume, and transmitting data indicated by the write command record to the secondary volume so that consistency is maintained at all times for the source volume, and the secondary volume is further for receiving and storing the data indicated by the write command record, wherein the copy of the initial data file and the data indicated by write command record stored on the secondary volume are available for use in generating a copy of the updated data file.