Remote replication in storage systems is used to replicate volumes from a local storage system to a remote storage system.
In asynchronous remote replication, batches of updates are periodically sent to the remote storage system. Preparing the batches of updates involves some data processing, for example, in the local storage system, the processing may include: determining the differences that occurred in a volume to be replicated since the previous cycle, and further involves accessing storage drives or cache memory for reading the data that needs to be transmitted to the remote storage system, queueing the data in a transmission queue of the communication protocol and then utilizing a communication line for transmitting replication messages containing the updates. In the remote storage system, some equivalent data processing is performed, such as: reading the replication messages from the receive queue of the communication protocol and writing the data of the updates to the mirrored volume.
Multiple replication processes can be handled in parallel, each replication process replicates data of a specific storage entity, e.g., a volume.
When implementing asynchronous replication, the replication processes take place at the background, concurrently with processes of handling I/O from hosts, which are performed in the foreground without being dependent on the execution of the replication processes. The replication processes are generally given a lower priority than the processes at the foreground so as not to affect the latency of ongoing I/O requests.
The local and remote storage systems may both manage background and foreground processes and each of the systems may experience different workload. One of the two systems, e.g., the local storage system, can process the batch of updates faster than the other system or faster than the communication line can handle. This unbalanced processing can cause bottlenecks, either at the communication line or at the remote storage system, and eventually the replication process will be halted until the bottlenecks are alleviated and the remote storage system responds. For example, if the local storage system transmits updates of a certain volume to the remote storage system faster than the remote storage system can handle, then the end-to-end replication process of the certain volume will be slowed down to comply with the pace of handling updates at the remote storage system. Thus the faster processing at the local storage system does not contribute to hastening the replication, as the bottleneck is at the remote storage system. In addition, the faster processing at the local storage system may waist resources that could have been used by other replication processes and may also cause unnecessary latency to foreground processes running at the local storage system and share the system resources with the replication processes.