Data lies at the heart of every enterprise, and is a core component of data center infrastructure. As data applications become more and more critical, there is a growing need to ensure complete business continuity.
Disaster recovery systems provide data protection and application recovery. Some disaster recovery systems use virtual data replication within a hypervisor architecture, and are able to recover any point in time.
Disaster recovery systems are typically operative to maintain disk replicas of enterprise data disks. Some disaster recovery systems, referred to as continuous data protection (CDP) systems, enable restoring a disk replica to a previous point in time. CDP systems log each command to write data into a designated address of a dedicated data disk, into one or more write journals. Each journaled set of commands that together constitute a consistent disk image, is stamped with a date and time. At various times, the journaled commands are promoted to the replica disks, to update the replica disk images to a more recent time, and the write journals are then purged and restarted from the more recent time. The purged journal commands are converted to undo journal entries, for use in rolling back data to a time prior to the promotion time.
As such, disk images at any desired recovery point in time may be determined from the replica disk images, the write journals and the undo journals. If the desired recovery point in time is later than the most recent promotion time, then the disk images corresponding to the desired recovery point in time are obtained by applying the write commands that were journaled prior to the desired recovery point in time, to the replica disk images, to roll forward the replica disk data to the desired recovery point in time. If the desired recovery point is earlier than the most recent promotion time, which is generally the case, then the disk images corresponding to the desired recovery point in time are obtained by applying the undo commands that are time stamped after the desired recovery point in time, to the replica disk images, to roll back the replica disk data to the desired recovery point in time.
In a multi-host enterprise environment, continuous data protection (CDP) disaster recovery systems need to perform consistent cross-host journal checkpoints. In order to ensure a consistent enterprise recovery, it is required to checkpoint the write journals when the enterprise disk images correspond to a common point in time. For such marking to be possible, all hosts must be operative to quiesce writes at a common point in time. Quiesce writes for synchronization generally impact performance, and thus must be carefully applied.
Alternatively, some disaster recovery systems synchronize clocks across hosts and timestamp each write operation, to ensure that the writes are properly sequenced in the write journals. Such systems are complicated to deploy with consistency, because it is difficult to synchronize independent clocks to the millisecond.
Other conventional disaster recovery systems send a quiesce command to all hosts, receive acknowledgements of successful quiescence, take a consistent snapshot image of all disks, and then send release quiesce commands. Such systems are exposed to a risk of reducing performance of enterprise data applications.
It would thus be of advantage to enable cross-host consistent CDP checkpointing, without requiring synchronized clocks and without reducing performance of data applications.