1. Technical Field
The present invention relates to data storage and retrieval generally and more particularly to a method and system for performing periodic replication using a log.
2. Description of the Related Art
Data replication products replicate data associated with application write operations or “updates” over a network to remote sites, making the replicated data available for processing (e.g., backup, disaster recovery, decision support, data mining, etc.). Conventional data replication products offer different modes of replication each offering different guarantees on the content and availability (recovery point and recovery time) of the data at the remote site. Such replication modes typically fall into one of three categories, synchronous, asynchronous and periodic.
When replicating synchronously, a replication product maintains secondary site data completely up to date with respect to primary site data. An application write operation on a synchronously replicated data volume completes as soon as the update is logged at the primary site, and, transmitted to and acknowledged by all secondary sites. In this mode of replication, a remote site is always up-to-date or “current” and consistent. Synchronous replication has the overhead of including a network round trip time to each write operation service time and hence tends to decrease application performance.
In asynchronous replication, the transfer of a write to a secondary site occurs outside of the main input/output (I/O) path. A log is used to record each write and the write operation request is indicated as complete to the requesting application once the write has been logged. The logged writes are then sent “asynchronously” to each remote site while maintaining write-order fidelity and consequently consistency. Synchronization between primary and remote secondary data volumes is not consistently maintained in an asynchronously replicated system however and consequently secondary data volumes cannot be relied on to be “up-to-date” at any given instant.
Utilizing conventional periodic replication, changes or “updates” to a primary data volume stemming from application write operations are tracked using a change map. Each remote site is then incrementally synchronized using the tracked changes at periodic or scheduled intervals. In a typical periodic replication system, such change maps are implemented as bitmaps where each bit represents a region in the data volume or “set” to be replicated. Consistency is maintained by atomically synchronizing secondary data volumes with all changes which took place during a given tracking period or interval. Accordingly, each remote site in a periodically replicated system is “current” up to the last synchronization event that occurred.
A significant drawback associated with both synchronous and asynchronous modes of replication is that data may be inefficiently or unnecessarily transmitted over an associated network if the same blocks of the primary data volume are written to multiple times (e.g., with identical data due to the operation of an application or with different data where the final write operation is the only one of importance). While periodic replication avoids this disadvantage by transferring cumulative data changes occurring over a period of time, because each write operation can “dirty” a large data volume region data, periodic replication may also result in the unnecessary transmission of unchanged data when there is little spatial locality of writes.