Information drives business. For businesses which increasingly depend on such information for their day-to-day operations, improving the accessibility and usability of data and preventing or quickly recovering from unexpected downtime caused by data loss or corruption are of paramount importance. After the terrorist attacks on the World Trade Center and Pentagon on Sep. 11, 2001, disaster recovery has received heightened emphasis in the business-critical resource planning process, and replication of business critical information has become a top priority. As business data grows, replication of the business data consumes increasing amounts of time of information technology workers, as well as bandwidth on production servers. Replication infrastructures can become very complex to construct and manage.
To increase data accessibility and usability and to minimize the impact of data corruption and loss, a number of techniques have been developed and implemented. One such technique involves the creation of one or more “mirror” copies of data.
Mirrored data is typically generated by duplicating data and update operations (e.g., write operations) from a primary data storage area to a mirror or “replicated” storage area in real time as updates are made to primary data. Such duplication may be performed by software (e.g., a volume manager, operating system, file system, etc.) or hardware (e.g., a storage device controller). The mirrored data may then be used to facilitate failover recovery, load balancing, and/or to generate frozen images (e.g., snapshots) used in performing various on-host and off-host processing functions (e.g., backups, data analysis, etc.). A snapshot is typically generated by “detaching” a mirror being updated in real time so that it is no longer updated. Detaching the mirror involves halting transactions being applied to the primary data storage area and to the mirror for a very brief time period to allow existing transactions to complete. A snapshot is then taken, which serves as a frozen or “point-in-time” image and provides a logically consistent copy of the primary data.
FIGS. 1A-1C illustrate the generation and use of data mirrors according to the prior art. In FIG. 1A, two mirrors of data 110 are maintained within a storage environment 100, and corresponding updates are made to mirrors 120A and 120B when an update, such as update 104A, is made to data 110. For example, update 104B is made to mirror 120A residing on mirror data storage area 122, and corresponding update 104C is made to mirror 120B residing on mirror data storage area 124 when update 104A is made to data 110. In a conventional data storage system, each mirror resides on a separate physical storage device from the data for which the mirror serves as a backup, and therefore, data storage areas 112, 122, and 124 may represent three physical storage devices.
A snapshot of data can then be made by “detaching,” or “splitting,” a mirror of the data so that the mirror is no longer being updated. FIG. 1B shows storage environment 100 after detaching mirror 120B. Detached mirror (snapshot) 120B serves as a snapshot of data 110 as it appeared at the point in time that mirror 120B was detached. When another update 106A is made to data 110, a corresponding update 106B is made to mirror 120A. However, no update is made to detached mirror (snapshot) 120B. Instead, a pointer to the data changed in update 106A is retained in a data change log 130, which tracks changes in primary data with respect to detached mirror (snapshot) 120B.
In a typical data storage system, resynchronization allows snapshots to be refreshed and re-used rather than discarded. A snapshot such as snapshot 120B can be quickly re-associated with the primary data which it previously mirrored in a process sometimes referred to as a “snapback.” Updates made to the primary volume while the snapshot was unavailable for update are tracked using data change log 130. When the snapshot is “re-attached” to again serve as a mirror, only the updates that were missed are applied to re-synchronize the snapshot with the primary data. For example, if the storage device storing detached mirror (snapshot) 120B will be again used to serve as a mirror for production data, an update applying the change made in update 106A would be applied to snapshot 120B before other updates are made.
In FIG. 1C, mirrors (e.g., mirror 120A and detached mirror 120B) are used to facilitate failover recovery and provide consistent access to a frozen image of primary data 110. Following a failure of storage area 112 after which data 110 is no longer accessible, mirror 120A residing on mirror data storage area 122 may be used in place of primary data 110 to ensure the availability of data stored within storage environment 100. Updates (e.g., update 108) are subsequently performed on mirror 120A with no corresponding updates being done on data 110 (which is inaccessible due to failure) or detached mirror 124 (which is being maintained as a read-only, point-in-time image of primary data 110).
While the failover recovery technique may be used to provide access to both point-in-time and real-time or “live” images of primary data following the failure of an associated storage area, it requires the creation and maintenance of at least one duplicate mirror of primary data. Conventional storage environments may include additional mirrors to avoid dependency on a single data mirror in the event of such a failure, thereby increasing the amount of resources necessary to provide such functionality. Consequently, using mirrors for failover and disaster recovery has some undesirable characteristics.
While snapshot technology enables point-in-time images of data to be maintained on a single node, replicating the data to different nodes for disaster recovery purposes introduces another level of complexity. Storage space must be managed at multiple sites, and large amounts of data must be transferred across a network without disrupting the business activities of the organization or degrading performance of business-critical systems. Complex storage configurations on one node require corresponding complex storage configurations on replication nodes.
What is needed is a system that efficiently replicates data shared by multiple nodes without degrading the performance of the application writing the data.