Storing data in a computing environment generally involves two main components: (1) a host that receives data from user applications and (2) some means for storing that data, such as a disk, a database, a file system, and/or some combination of such storage (collectively, “local store” or “physical volume”). Typically the host receives an input or output (“I/O”) request, formulates a corresponding read or write command, and transmits that command to the local store. Once the I/O has been completed, the local store reports the status of that command (e.g., showing that the specified data has been read or that the write has been completed) back to the host and the host then propagates that status back to the application. This allows the host to be able to determine generally what data has been committed to the local store, i.e., physically written to the disk, and what data potentially has not been committed, i.e., the host has not yet received a confirmation for that data or the data has not been written to disk yet.
Often it is not enough to allow the host to determine generally what data has been committed to the local store. In the event of a hardware failure or geographic catastrophe, having a backup of what was written to the local store can be crucial. Advantageously, a remote backup, i.e., one that is geographically distant from the local store, helps ensure that if a destructive event such as an earthquake or flood occurs at the location of the local store, the data committed to disk up to the time of the destructive event is retrievable from the remote store.
A traditional approach to backing up and storing data is to use a synchronous data replication scheme. In such a scheme, the local store and remote store are complete replicas of each other and any write operations performed to the local store are applied to the remote store. Once the write is completed at the remote site, the local store may processes the next write operation. Due to the requirements imposed by synchronization, e.g., that both stores are complete replicas of each other, a local store cannot report back to the host that it has completed its write operation until the remote store also reports back that it is done with the same operation. Waiting for write confirmations can affect performance severely. Synchronous data storage, though useful, is also hindered by latencies associated with speed-of-light issues when great distances separate the source and destination. The time required to synchronize data between stores across great distances tends to cause unacceptable performance degradation on the host side of the coupled synchronous stores.
Although synchronous storage generally supplies a consistent picture of what I/Os were committed to disk, it is latency heavy and does not make efficient use of the disk spindle hardware. For example, in some synchronous storage systems, the local disk spindles must wait for the remote store to report back that it completed the last set of I/O operations before writing new, incoming I/Os to disk. This involves a loss in performance since the time spent waiting for the I/O completion acknowledgement would be better spent committing other I/Os to the local disk.
The deficiencies are compounded when using a distributed architecture. Backing up a distributed architecture is also difficult with respect to time synchronization between volumes on the host side. One solution is to time-stamp every I/O that comes through the host before it is committed to disk. If every incoming I/O is time-stamped, however, there must typically be a single time-stamping mechanism. This creates a bottleneck since every incoming I/O, regardless of which volume it is applied to, must be ordered into a single-file line for time-stamping before being sent to the appropriate I/O handler. To alleviate the single-file line dilemma, ideally multiple time-stamping mechanisms could be used, allowing different I/Os to be time-stamped in parallel. Doing so, however, requires coordination between time-stamping mechanisms down to a fraction of a microsecond. If such synchronization is not achieved, different I/Os that span volumes may be ordered incorrectly, which would result in data corruption. Such coordination is not feasible since not only must time-stamp mechanisms be calibrated carefully, they must constantly be monitored to ensure they stay in synchronization.
Rather than wait for each I/O operation to individually complete, a second way to achieve synchronization is to periodically prevent incoming I/Os from committing, waiting until all write operations currently being performed are completed, copying the entire source store to the destination store, and upon completion of the copy, allowing incoming I/Os to begin committing again. This method is inefficient from a disk usage scenario because no new I/Os can commit during the copying process. Since no new I/Os can be committed, and existing I/Os are continually processed, the I/O queue depth drops lower and lower until it reaches 0, i.e., no new I/Os are processed and all existing I/Os have been committed to disk. In the storage domain, it is desirable to keep the efficiency of the storage mechanism's disk spindles as high as possible, i.e., to maximize the amount of disk read/writes per movement of the spindle arm. Stopping new I/Os from committing and allowing all current I/Os to be committed, effectively dropping the I/O queue depth to zero, is not efficient from a spindle utilization standpoint. A more robust and efficient approach is asynchronous replication.
One known technique of asynchronous replication helps to solve the remote-side bottleneck of synchronous replication. Rather than sending I/Os to the remote side and waiting for their completion status, this asynchronous replication technique begins by accepting incoming I/Os and committing them to both the disk and a journal. After the I/Os have been recorded in the journal, they are typically sent to the remote store (via, for example, an Ethernet connection). Beneficially, if the communications link between the local store and the remote store is down or is busy, journals may accumulate on the local side temporarily, effectively holding onto the data that represents the changes to the local store. Once the communications link is restored, journals may be sent to the remote store and their entries applied accordingly. Meanwhile, the local store has moved to the next set of incoming I/Os and has reported the completion status back to the host. Effectively, the local store reports completions back to the host at the rate required to write the I/Os to the disk.
Another asynchronous approach to improving efficiency is to use a “snapshot and bulk copy” mechanism on the local store, effectively capturing what the local store looked like at a given point in time. In the snapshot scenario, the local store is frozen so that the spindles have generally stopped handling incoming I/Os, but only for as long as it takes to create the snapshot image of the entire data store (as opposed to waiting until all existing I/Os have processed). Once the snapshot of the data store is complete, the queued I/Os are allowed through and the storage mechanism may continue processing I/Os. Then, while the system is processing new I/Os, the snapshot is transmitted to the remote data store. This data transfer, asynchronous to the processing of I/Os, can have better performance than synchronous schemes, but it is still inefficient because I/Os cannot commit during the snapshot process, which, depending on the size of the storage medium, could be seconds or even minutes. Making reliable backups using just this method is difficult because typically either the frequency of the snapshots is high (more frequent snapshots means a smaller rollback period in the event of a failure) or the time between snapshots is high (to maintain high performance).