Critical data is often protected against disasters by copying it to another site. One technique in use for this purpose is known as remote copy.
Remote copy is the pairing of a disk (or logical volume) with another disk for use as a backup. The original disk is known as the primary and the backup disk is known as the secondary. Whenever data is written to the primary it must also be written to the secondary to ensure the backup stays up to date. Remote copy may be implemented synchronously so that processing at the host is delayed until confirmation of the completion of the corresponding write at the secondary.
Remote copy may be also implemented asynchronously. Asynchronous remote copy means that the host that wrote the data to the primary is not delayed while data is copied to the secondary. That is, as soon as the data has been written to the primary, the host is notified of its completion. The data is then copied to the secondary asynchronously.
One of the main challenges when implementing asynchronous remote copy is maintaining consistency of the secondary disk. Maintaining consistency means keeping the secondary data in a state that the primary data could have been in at some point during the copying process. The secondary data is allowed to be “out of date” (i.e., a certain number of updates have not yet been applied to the secondary), but it cannot be allowed to be inconsistent.
One technique for maintaining consistency, while keeping resource consumption low and performance acceptable, is to use a set of client and server nodes to control the batching and sequencing of writes to a remote copy secondary system. Host writes that are by definition independent of one another can be batched up and issued with a sequence number, and the writes at the secondary can then be executed in sequence number order to maintain consistency at the secondary.
However, when a system is adapted to perform remote copy using sequence numbers to achieve data consistency in a multi-node system, a node that has been issued a sequence number may not be able to issue the secondary write for that sequence number. For example, the node may fail due to hardware or software issues, or it may lose communications with the other nodes. This creates a problem because until all writes for a sequence number have completed, writes for the next sequence number cannot start. So the loss of one node prevents the system from making progress.
In the above situation, since the primary writes for the I/Os that are now stalled may have already completed to the host, failing out the I/Os and letting the hosts recover from the problem is not an option. Instead, the system must wait for the error to be fixed, and then resend any secondary writes that had not completed at the time of the error, thus maintaining data consistency.
One possible way of dealing with this situation involves keeping a non-volatile record of all disk-sectors that are different between the primary and secondary. When a write arrives at the primary, a bit is set for the relevant disk sector. When the secondary write completes, the appropriate bit is cleared. After recovering from an error, the bitmap can be used to cause the writes for any sectors whose bits are set.
The problem with this solution is that it does not maintain data consistency during the recovery process. The bits are processed in an arbitrary order, so the system may send dependent writes out of order, thus leaving the secondary inconsistent. This could be safeguarded against by taking a snapshot of the secondary before starting the recovery, but this requires additional storage and processing overhead.
The above solution also has the problem that new write I/Os may be setting bits in the bitmap while the system is trying to clear it to process recovery I/O. This can mean that the recovery process takes a long time to complete, leaving the secondary inconsistent for an extended period, and increasing the recovery point objective to an unacceptable length of time.
It would thus be desirable to have a technological means for efficiently managing errors in a consistent remote copy data storage system.