Many storage networks may implement data replication and/or other redundancy data access techniques for data loss protection and non-disruptive client access. For example, a first storage cluster may comprise a first storage controller configured to provide clients with primary access to data stored within a first storage device and/or other storage devices. A second storage cluster may comprise a second storage controller configured to provide clients with primary access to data stored within a second storage device and/or other storage devices. The first storage controller and the second storage controller may be configured according to a disaster recovery relationship, such that the second storage controller may provide failover access to replicated data that was replicated from the first storage device to a secondary storage device, owned by the first storage controller, but accessible to the second storage controller (e.g., a switchover operation may be performed where the second storage controller assumes ownership of the secondary storage device and/or other storage devices previously owned by the first storage controller so that the second storage controller may provide clients with failover access to replicated data within such storage devices). In an example of a logical replication scheme, the second storage controller has ownership of the replicated data. The second storage controller may provide read-only access to the replicated data. The second storage controller may convert the replicated data to full read-write access upon failover. In an example of physical replication, the storage device, comprising the replicated data, is owned by the first storage controller until a failover/switchover to the second storage controller occurs.
In an example, the second storage cluster may be located at a remote site to the first storage cluster (e.g., storage clusters may be located in different buildings, cities, thousands of kilometers from one another, etc.). Thus, if a disaster occurs at a site of a storage cluster, then a surviving storage cluster may remain unaffected by the disaster (e.g., a power outage of a building hosting the first storage cluster may not affect a second building hosting the second storage cluster in a different city).
In an example, two storage controllers within a storage cluster may be configured according to a high availability configuration, such as where the two storage controllers are locally connected to one another and/or to the same storage devices. In this way, when a storage controller fails, then a high availability partner storage controller can quickly takeover for the failed storage controller due to the local connectivity. Thus, the high availability partner storage controller may provide clients with access to data previously accessible through the failed storage controller.
In an example of a high availability configuration, high availability to data may be provided without using shared storage. In particular, high availability to data is provided using a synchronous replicated copy of a primary storage object. The high availability to data may be provided through a software defined architecture, using synchronous replication, and is not limited to merely two storage controllers.
Various replication and synchronization techniques may be used to replicate data (e.g., client data), configuration data (e.g., a size of a volume, a name of a volume, logical unit number (LUN) configuration data, etc.), and/or write caching data (e.g., cached write operations not yet flushed to a storage device, but cached within memory such as a non-volatile random access memory (NVRAM)) between storage controllers and/or storage devices. Synchronous replication may be used where an incoming write operation to the first storage controller is locally implemented upon a first storage object (e.g., a file, a LUN, a LUN spanning multiple volumes, or any other type of object) by the first storage controller and remotely implemented upon a second storage object (e.g., maintained as a fully synchronized copy of the first storage object) by the second storage controller before an acknowledgement is provided back to a client that sent the incoming write operation. In another example, asynchronous replication may be achieved by capturing snapshots of a volume, determining data differences (e.g., deltas) between a current snapshot and a last snapshot used to replicate data to the second storage object, and using incremental transfers to send the data differences to the second storage controller for implementation upon the second storage object. Semi-synchronous replication may be achieved where an acknowledgment back to a client for a write request is based upon local implementation upon the first storage object, but is not dependent upon remote implementation upon the second storage object.
Unfortunately, various issues such as a failure of a storage controller, a transient network issue, and/or other issues may cause the first storage controller and the second storage controller to become out-of-sync, such as a transition from a synchronous replication relationship to an asynchronous replication relationship that does not guarantee zero or near-zero recover point objectives (RPO) for clients. Substantial amounts of resource utilization and client data access disruption may occur when attempting to transition back from the asynchronous replication relationship to the synchronous replication relationship (e.g., overhead relating to the creation of snapshots, incremental transfers using snapshots, etc.).