Replication is typically employed as part of a data backup and recovery storage strategy and, as such, denotes the movement of data from a source storage space (e.g., one or more source volumes) of a primary site or “source domain” to a target storage space (e.g., one or more destination volumes) of a secondary site or “target domain” via a communications network (e.g., a computer network) in such a way that enables recovery of applications from the destination volume. As used herein, recovery denotes loading of the applications on possibly different host or user systems (e.g., computers) where they can access the destination volume, instead of the source volume, resulting in the applications loaded to a valid state. Also, a volume denotes any storage medium, such as a disk, having addresses that enable data to be accessed in a stable way and, as such, may apply to file system access, block access and any other storage access means.
The source domain contains at least the source volume, but may also contain the user systems embodied as, e.g., replication clients, a switching fabric and any source replication components situated outside of those components. In this context, a component may either be a physical entity (e.g., a special replication appliance) and/or software entity (e.g., an application and/or device driver). In remote disaster recovery, for example, the source domain includes an entire geographical site, but may likewise span multiple geographical sites. The target domain includes all of the remaining components relevant for replication services, including the destination volume coupled to a target storage system embodied as, e.g., a replication server. In addition, a replication system includes components that may be located in both the source and target domains.
The replication system typically has at least one component, i.e., a write interception component, which intercepts storage requests (e.g., write operations or “writes”) issued by the replication client to the source volume, prior to sending the intercepted writes to the destination volume. When issuing a write, a user application executing on the replication client specifies an address on the source volume, as well as the contents (i.e., write data) with which the volume address is to be set. The write interception component may be implemented in various locations in the source domain depending on the actual replication service; such implementations may include, e.g., a device driver in the replication client or logic in the switching fabric.
For example, assume the replication client is one of many independent (non-coordinated) replication clients that span various geographical locations of a source domain. Further, assume that a user application or multiple (coordinated) user applications issue writes for storage on a source volume of the replication client. These writes must be intercepted by the write interception component and replicated consistently on a destination volume of the target domain such that, if a disaster arises, storage on the destination volume can be recovered in a manner that maintains the order of writes issued to the source volume by the user application.
Often the write interception component is upgraded to a new software version having, e.g., different data structures and functionality (features/services). A common approach used to perform such an upgrade is to modify the data structures in an “old” version of the software component to comply with the data structures in the new version. Once this completes, the computer is rebooted to run with the new data structures. The problem with this approach involves the substantial resources needed to (i) identify the data structures that require modification, (ii) rewriting code to modify those identified data structures and (iii) verifying the accuracy of the rewritten code. Thus, a substantial amount of resources is consumed for every upgrade.
In addition, upgrade of a typical software component, such as an application, generally involves halting (interrupting) operation of the old version of the component and subsequent installation of a “new” version of the software. However, if operation of the write interception component is interrupted to enable installation of an upgraded version, interception of writes may be disrupted (missed), causing inconsistency between data stored on the source and destination volumes. As a result, a resynchronization procedure must be performed to re-synchronize the data on the destination volume with the data on the source volume, which is time consuming. Furthermore, if a failure (disaster) occurs at the replication client during the resynchronization procedure, the data on the destination volume may not be consistent with the data on the source volume and, as such, may not be reliably used for disaster recovery, as intended with the replication system.