In today's technological era, businesses and government entities are increasingly reliant upon high performance computing and storage systems. A massive amount of critical data, such as financial, personal, entertainment, and corporate data, is stored in electronic form on storage systems. Loss of such data cannot be tolerated; thus, many safeguarding methods are employed in storage systems.
As part of safeguarding such critical data, the effects of a disaster must be considered. A natural disaster such as a flood or earthquake, or a technical disaster such as a major computer failure, could render a storage system useless, and thus the data it contains non-accessible or destroyed. Remote replication storage systems have been employed to circumvent this unacceptable possibility. In a remote replication storage system, data is stored as usual at a local site where applications normally execute. Another storage system is deployed at a remote site—possibly many miles away. All the data stored at the local site is also copied to the remote site. Thus, if a disaster occurs at the local site, the data, at least up to a point in time, can be recovered at the remote site. Data copy at the remote site can also be used for backup, vaulting and archiving for regulatory compliance and other applications for parallel processing of remotely copied data.
There are various types of remote replication methods, each having advantages and disadvantages. In accordance with synchronous remote replication methods, an update to a local copy is not allowed to complete to the application until it has also been successfully stored at the remote site. The synchronous method is highly desirable, as both the local and remote copies always contain the same application data at any point in time. However, synchronous methods cannot be reasonably deployed over long distances, because application performance is negatively affected by the time it takes to transfer the data to the remote site to complete each I/O operation. In accordance with asynchronous write ordered remote replication methods, an update to a local copy is queued at the local site for transfer to the storage at the remote site, and then the update is allowed to complete to the application. Asynchronous write ordered remote replication methods offer distance independence and thus higher application performance than synchronous methods, with the added risk that, in case of a disaster at the local site, some of the queued storage operations may not be completely transferred to the remote site resulting in some data loss.
Both the synchronous and the write ordered asynchronous methods share a common disadvantage. That is, if multiple updates to the same location in local storage are made by the application, all the updates are sent to the remote site. In accordance with the concept of locality of reference, such multiple updates occur often. A third type of remote replication method takes advantage of locality of reference to offer a higher efficiency remote replication solution. This method is referred to as delta set based asynchronous remote replication. In accordance with this method, writes to a local copy are aggregated over a period of time, known as a “delta cycle”, into a “delta set”. Upon expiration of the time period, the delta set containing the aggregated writes is transferred, all or nothing, to the remote site. It can be seen that multiple updates to the same location in a local site during a delta cycle are transferred as one update to the remote site, thus decreasing bandwidth requirements between sites and increasing efficiency. Delta set based asynchronous remote replication is therefore a highly efficient long distance replication method that provides higher application performance than synchronous replication.
Because delta set based asynchronous remote replication aggregates writes to the local copy over time, particular types of operations, for example Microsoft Volume Shadow Copy Services (VSS), that rely on an external event based exact point-in-time state of the local copy may be negatively affected. The external event based exact point-in-time state of the local copy after such operations may not be reproducible at the remote copy in a delta set asynchronous remote replication. As a result, in delta set based asynchronous remote replication it is desirable to provide a mechanism that can identify local events and capture the point-in-time state of the local copy at the time of the event, independent of the delta cycles, and detect exactly the same point-in-time state of local copy on the remote copy.