The present invention relates to data storage, and more particularly, to systems and methods configured to enable efficient flash copy for disaster recovery (DR) testing.
Some data storage systems are capable of creating a point-in-time copy of virtual tapes for DR testing. One such data storage system is IBM's TS7700 Grid Architecture. This capability allows business operations to help simulate and test the ability to resume in the event of a product or a site failure. In such a grid configuration, up to six clusters (or sites) are interconnected and are configured to replicate data created on any of the clusters in the configuration. As part of a total systems design, business continuity procedures are developed to instruct information/technology (I/T) personnel in the actions that should be taken in the event of a system failure. Testing of those procedures (also known as DR testing) is performed either during initial installation of the system and/or at some regular interval after initial installation.
During the DR testing, users make an attempt to simulate a true disaster when one or more clusters are unavailable at a first cluster or site (such as a production cluster or site). A DR host system is restored and cluster data is accessed through a user's predefined DR cluster or clusters. Even though the predefined cluster or clusters have provided some DR testing features, generally, to help the user simulate a true disaster, there are still a few problems related to the DR testing with such clusters.
One such problem is the ability to support a complete set of point-in-time copies of all virtual tapes for DR testing use only. In a real world case, the point in time in which the production cluster (or production environment) becomes unavailable is not predictable, thus the state of the cluster or clusters is unpredictable with respect to the consistency of the replicated data. Data may have not yet completed replication to a DR cluster or site, or the replication for some data may not have even started. With conventional DR testing, copies continue after the DR testing has started, which provides misleading results because the copy would have stopped and the data not be available had a real disaster taken place. In addition, if copies are not available on the DR cluster or clusters, the DR host system will simply access remote content through the grid, which also typically is not possible in a true disaster scenario. Also, data on a production cluster which is modified via the production host will also be modified on the other DR cluster(s) or site(s) of the grid.
In contrast to this capability, users prefer to mimic the consistency of the DR cluster(s) or site(s) at a time-zero (time of the simulated disaster). Only data consistent within the DR cluster(s) or site(s) at time zero should be accessible to a DR test host. Some users accomplish this today by disconnecting the DR cluster(s) or site(s) from the production cluster(s) or site(s). But most users require the actual production data to still continue to replicate to the DR cluster(s) or site(s) so that in the event of a true (not simulated) disaster, this data is properly backed up.
Flash copy concepts exist in some storage products which offer business continuance testing, but the consistency awareness is limited to a single node or a cluster. However, some storage products offering grid architecture allow a user to have more than one cluster or site representing DR data. A method to flash more than one cluster/node in order to create a composite consistency point in time is not currently available.
This is due to several reasons. First, during a DR test, it is preferred that both a DR host and a production host are able to mount a virtual tape with a same identifier (such as a volume serial number “volser”) at the same time. However, with conventional systems, these mounts are serialized because of how the virtual tape ownership concept operates, restricting access to any virtual tape to only one accessing system at a time. In other words, at any given time, only one host is allowed to mount a virtual tape in current grid architectures. In order to perform a DR testing as desired by users, this protected concept of mounting a virtual tape with one host at any given time must be relaxed or modified.
In addition, production hosts may change an attribute of data or volumes, reuse data or volumes, and/or modify data or volumes. All these use cases should not alter the time-zero view at the DR cluster(s) or site(s). Existing flash copy solutions are able to accommodate data changes, but keeping track of volume attribute changes is not currently available in grid architectures.