Replication is typically employed as part of a data backup and recovery storage strategy and, as such, denotes the movement of data from a source storage space of a source domain to a target storage space of a target domain via a communications network (e.g., a computer network) in such a way that enables recovery of applications from the target storage space. As used herein, recovery denotes loading of the applications on possibly different hosts (e.g., computers) where they can access the target storage space, instead of the source storage space, resulting in the applications loaded to a valid state. Also, storage space denotes any storage medium having addresses that enable data to be accessed in a stable way and, as such, may apply to file system access, block access and any other storage access means.
The source domain contains at least the source storage space, but may also contain the hosts, a switching fabric and any source replication components situated outside of those components. In this context, a component may either be a physical entity (e.g., a special replication appliance) and/or software entity (e.g., a device driver). In remote disaster recovery, for example, the source domain includes an entire geographical site, but may likewise span multiple geographical sites. The target domain includes all of the remaining components relevant for replication services, including the target storage space. In addition, a replication facility includes components that may be located in both the source and target domains.
The replication facility typically has at least one component, i.e., a write interception component, which intercepts storage requests (e.g., write operations or “writes”) issued by a host to the source storage space, prior to sending the intercepted writes to the target storage space. The write interception component is typically embedded within a computing unit configured as a source replication node. When issuing a write, an application executing on the host specifies an address on the storage space, as well as the contents (i.e., write data) with which the storage space address is to be set. The write interception component may be implemented in various locations in the source domain depending on the actual replication service; such implementations may include, e.g., a device driver in the host, logic in the switching fabric, and a component within the source domain, e.g., a source storage system. The write interception component is typically located “in-band”, e.g., between the host and the source storage system, although there are environments in which the component may be located “out-of-band”, where a separate physical component, such as an appliance server, in the source domain receives duplicate writes by utilizing, e.g., an in-band splitter.
Synchronous replication is a replication service wherein a write is not acknowledged until the write data associated with the write is processed by the source storage space, propagated to the target domain and persistently stored on the target storage space. An advantage of synchronous replication is the currency of the target domain data; that is, at any point in time, the writes stored on the target domain are identical to the source domain. However a disadvantage of this replication service is the latency or propagation delay associated with communicating the writes to the target domain, which limits the synchronous replication service in terms of distance, performance and scalability.
An asynchronous replication service reduces such latency by requiring that the write only be processed by the source storage space without having to wait for persistent storage of the write on the target storage space. In other words, the write is acknowledged once its associated write data is processed by the source storage space; afterwards, the write (and write data) are propagated to the target domain. Thus, this replication service is not limited by distance, performance or scalability and, therefore, is often preferred over synchronous replication services. A disadvantage of the asynchronous replication service, though, is the possibility of incurring data loss should the source storage space fail before the write data has been propagated and stored on the target storage space.
Prior asynchronous replication services may be classified into a plurality of techniques or styles, one of which is write ordering. According to this replication style, the write interception component intercepts all writes (e.g., synchronously before an acknowledgement is returned to the application), buffers the intercepted writes and associates metadata with each write that reflects its relative order. The metadata may not be an actual timestamp, i.e., a monotonously-increasing number (sequence number) is sufficient. The buffered writes are then propagated to the target domain and applied in-order to the target storage space. The write interception component may alternately maintain ordering by intercepting the writes synchronously to the flow of the writes from the host to the source storage system. That is, the write interception component intercepts the writes and then transmits them to the target storage system in order.
The replication services may be further adapted to planned recovery or unplanned recovery. Planned recovery is defined herein as an act of recovery where components, e.g., hardware and software, of the source domain are fully operational, whereas unplanned recovery is defined as recovery that takes place when the source components are fully and/or partially non-operational. As used herein, the source domain describes all of the components whose failure/unavailability should not impair the ability to do unplanned recovery.
For unplanned recovery services, the writes may be propagated to the target domain without applying them directly to the target storage space to thereby ensure consistency in light of an intervening disaster. Accordingly, the writes are propagated to an intermediate staging area on the target domain before they are applied to the target storage space to ensure that the storage space can be “rolled back” to a consistent state if a disaster occurs. The replication services may utilize various intermediate staging areas (such as a persistent log or non-volatile memory) to buffer the writes in a safe and reliable manner on the target domain.
Often, a source domain having multiple hosts and/or multiple source storage systems may include only one source replication node (i.e., one write interception component) configured to intercept all writes associated with a consistency group. As used herein, a consistency group comprises storage space that requires consistent replication at a target domain. An advantage of such a configuration employing a write ordering replication service is the relative ease with which the writes can be ordered and consistent replication guaranteed. However, this configuration introduces a scalability issue because there is a limit to the processing bandwidth that the interception component can sustain, thereby resulting in potentially substantial adverse impact on performance of the entire configuration. Thus, this configuration may obviate use of a single write interception component.
For example, assume that a large data center is configured with many source storage systems configured to serve many hosts, wherein the source storage systems cooperate to maintain a consistency group. If all write traffic is directed to the single write interception component, a substantial scalability issue arises because the interception component will not practically be able to sustain the entire traffic. Now assume that a consistency group is configured to span multiple geographical site locations such as, e.g., among several small data centers geographically dispersed throughout a country or a plurality of countries. Here, the main reason for not using a single write interception component is not necessarily the scalability issue as much as the substantial latency introduced by such a configuration. This may necessitate either use of smaller consistency groups, which facilitates reliable and consistent group recovery on the target domain, or acceptance of large latencies and performance impact, which is undesirable. Therefore, such configurations may dictate the use of multiple write interception components.
Yet, prior write ordering style, asynchronous replication solutions have been generally unable to accommodate configurations employing multiple write interception components. A possible exception is the XRC Asynchronous Replication service available from IBM Corporation, which ensures write ordering among multiple write interception components through the use of a fine grained, extremely accurate, hardware-based global clock facility. The XRC service uses a dedicated, hardware mechanism to realize such an accurate global clock and, as such, is generally tailored to mainframe computers. That is, the ability to set a time that is extremely accurate is guaranteed by the hardware mechanism built into mainframe technology. Such a mechanism is expensive and generally not is deployable by systems running open, general-purpose operating systems. Furthermore, such mainframe technology may not be practically deployed in distributed environments because of latency issues, thereby rendering the hardware mechanism ineffective when servicing a consistency group that spans multiple geographical sites.