Critical data is often protected against disasters by copying it to a remote site. One technique in use for this purpose is known as Remote Copy.
Remote Copy is the pairing of a data storage system (or a logical volume of the data storage system) with another data storage system for use as a backup. The original data storage system is known as the primary storage and the backup data storage system is known as the secondary. Whenever data is written to the primary, the data is also to be written to the secondary, to ensure the backup stays up to date. Remote Copy may be implemented synchronously—that is, processing at the host is delayed until confirmation of the completion of the corresponding write at the secondary has been received—or it may be implemented asynchronously.
Asynchronous Remote Copy (ARC) means that the host that wrote the data to the primary is not delayed while data is copied to the secondary; as soon as the data has been written to the primary, the host is notified of completion. The data is then copied to the secondary asynchronously.
One of the main challenges when implementing ARC is maintaining consistency of the secondary disk. Herein, “maintaining consistency” means keeping the secondary data in a state that the primary data could have been in at some point during the process. In other words, the secondary data is allowed to be ‘out of date’ (i.e. a certain number of updates have not yet been applied to the secondary), but it is not inconsistent, in that the updates are available.
Table 1 below shows a sequence of events. During these events the secondary is out of date in relation to the primary, but the data it contains always matches something that the host could have read from the primary, and thus the secondary is always consistent.
TABLE 1ActionPrimarySecondary1. Host writes AAA to diskAAAXXXXXXXXX2. Write from step 1 completes to the hostAAAXXXXXXXXX3. Host writes BBB to diskAAABBBXXXXXX4. Remote copy sends AAA to the secondaryAAABBBAAAXXX5. Remote copy sends BBB to the secondaryAAABBBAAABBB
Table 2 below shows a sequence of events in which the updates to the secondary are applied in the wrong order. The write issued in action 3 is a “dependent write” as it is issued after the write of AAA completes. BBB may therefore only be written to the disk after AAA.
If the primary had failed after action 4, the secondary would have been left inconsistent, as the host knows that at no point did the primary contained the data XXXBBB.
TABLE 2ActionPrimarySecondary1. Host writes AAA to diskAAAXXXXXXXXX2. Write from step 1 completes to the hostAAAXXXXXXXXX3. Host writes BBB to diskAAABBBXXXXXX4. Remote copy sends BBB to the secondaryAAABBBXXXBBB5. Remote copy sends AAA to the secondaryAAABBBAAABBB
A known approach maintains consistency by forming a batch of writes that enters the primary and assigning the batch a unique sequence number, such that if a batch of writes B arrives after a batch of writes A it will have a higher sequence number. Within each batch, writes are chosen from different I/O processors 104 and are mutually independent of each other.
In such a batch scheme, the secondary may execute one batch of writes at a time in order to maintain data consistency. However, the primary is not subject to these constraints; it may perform I/Os in parallel, since it is the responsibility of each host to ensure it submits writes in a manner that will ensure consistency.
Generally, if the host has to perform any processing between writes, or if the host is doing reads as well, the I/O load at the primary which is mirrored to the secondary is less than the maximum I/O load that the secondary may cope with, so the system is balanced.
However, in heavy write workloads, the system is unbalanced and the secondary builds up a queue of batches waiting to be processed. This may be caused when the secondary has less capacity than the primary, when there is a transient fault in the secondary or when a network bottle neck occurs. The primary may have a finite number of batches (several hundred) outstanding at any given time, since it requires resources to keep track of each batch that is in progress. The primary quickly reaches its limit of outstanding batches, and waits for the secondary to complete some batches before any more may be granted. In this situation, if a RequestSequenceNumber call is made to the Primary Server, the primary sever stalls until the primary server receives a WriteBatchDone call from the secondary server 109.
As a result, the primary response time is disproportionately affected by any fluctuations in the secondary response time. Consider the following scenario. The secondary is processing a batch of 100 writes, which might normally take 1 ms to complete. The secondary has a problem or the secondary is so heavily loaded that the secondary stalls for 100 ms. During this time, 2000 write I/Os arrive at the primary and all stall waiting for their RequestSequenceNumber calls to complete. But the new writes can't complete until the secondary completes a batch and frees up a sequence number.
The secondary completes the batch (after a 100 ms delay). The RequestSequenceNumbers complete with a latency of 100 ms. Each of the 2000 primary I/Os sees a latency of 100 ms, instead of the usual expected ˜1 ms latency. This causes the average latency seen by the host to go up by a large amount.
Slow processing of 100 writes on the secondary causes a slow response time to the host for 2000 writes, magnifying the delay by a factor of 20. This is not the case with Synchronous Remote Copy, where one slow secondary I/O will cause one slow response time to the host.