The present invention relates to disaster recovery, and more specifically to replication of data for disaster recovery.
Business application administrators wish to provide continuity of business for their applications in case of a disaster. There are a variety of techniques for doing this, including building highly available applications that are simultaneously active at multiple physical locations, and building disaster recovery solutions that allow an application and its data to be replicated from one physical location to another and recovered (or failed over) quickly on the secondary location in case of a disaster.
As part of a disaster recovery solution, it is important to regularly test application failover in order to be confident of the solution, and in order to remain confident of the solution as changes are made to the application and the application infrastructure. Many businesses or industries stipulate regular business continuity tests as a self-governing requirement, and in some industries this is even a regulatory requirement. We will call this test a “practice failover”.
In the state of the art, it is possible to practice a failover without severing the replication of application data, by making a point-in-time copy of the data at the secondary location, and recovering the application using that copy of the data. This ensures that there is never a point in time during the practice failover when the primary location does not have a complete and full backup of all of its current data.
There are several important considerations to note with practice failover:
First, the business must plan for up to double the amount of data usage at the secondary location in order to allow the data to be copied (techniques such as thin-copy methods or transient re-do logs can be used to reduce this overhead).
Second, if the replicated application is reinstated using the same network naming and addressing as at the primary location, care must be taken to ensure the practice failover is performed in a segregated network environment that does not compromise the application that is still running at the primary location. (This segregated network environment also helps to verify that the secondary location is not dependent on infrastructure or services from the primary location in any way.)
Finally, it is important to carefully coordinate the copying, mapping, and attachment of all replicated storage at the secondary location, without impacting the original copy of the storage that is still being replicated during the practice event.
When using software-based storage replication such as VMware from Dell's Site Recovery Manager (SRM), it is possible in the state of the art to perform a coordinated practice failover for an entire application or set of applications while preserving the original replica. However, this is not possible as a single coordinated action when using Storage Area Network (SAN)-based storage replication.
Using SAN-based storage replication requires the application administrator to explicitly coordinate all activities, including (1) the identification and copying of multiple storage volumes, (2) the mapping and attachment of these storage volumes to the application infrastructure, (3) the registration and initiation of the applications in the application infrastructure. This is a very complex and error-prone activity.
SAN-based storage replication is much preferred in the industry, as it is more efficient and also allows for better recovery point objective (RPO) characteristics than software-based replication.
U.S. Pat. No. 8,788,877 addresses the case of what happens when a disaster occurs in an environment that is replicating storage. Whether the disaster involves the loss of the original storage, or the replicated copy, the disaster leaves the system with only a single copy of data.
US Patent Application Publication 2011/0231698 addresses the implementation of software-based methods for storage replication, and documents particular ways of optimizing this replication.
US Patent Application Publication 2012/0254123 addresses the implementation of point-in-time backup or snapshot copies that are made of a virtual filesystem that is backed by one or more physical filesystems or disks. It addresses the need to keep track of how the constitution of the virtual filesystem from the physical disks may change over time, so that older snapshots may relate to a different configuration than the present time.