1. Field of the Invention
The present invention relates to relates generally to the field of data replication within distributed computer systems.
2. Related Art
It is well known that storage devices (e.g. disk drives) may fail over time or may be lost due to theft or natural disasters such as fire. However, whilst hardware can usually be replaced with relative ease, the loss of data can be catastrophic as another copy cannot simply be purchased off the shelf. Therefore, individual users and organizations typically create backup copies of data so that in the event of a hardware loss, such as disk failure, normal operations can be resumed with minimal disruption.
Typically, a large organization will back up the contents of its disk drives onto (relatively slow) tape storage devices. However, a considerable length of time, perhaps several hours, may be required to take a full backup of a large data set and so backups often have to be made during ‘down times’ such as overnight or out of business hours. Furthermore, inconsistencies can arise if changes are made to the data while the backup is in progress and so write operations may need to be blocked while the backup is being created. However, this unavailability is not acceptable to organizations which require uninterrupted access to their data.
Therefore, it is advantageous to create an instantaneous copy of a disk's contents while applications are running. Virtualization techniques can be used within networks to create and maintain (in real-time) a replica of the data set on other storage devices, the replica being updated over time as the data changes in response to write operations. In this way, reliable access to the data may be preserved via the remotely stored replica if the local storage device becomes inoperable, whilst maintaining high availability of data and functionality. Thus, whilst a backup copy may remain unchanged for a relatively long period of time, a replica will be updated frequently as a result of applications which are running and writing updates to the data set. Several known replication techniques have been developed to copy data to other storage devices.
Mirroring
Mirroring is a known data replication technique where the contents of a logical disk volume are copied onto other storage devices. Each time a write operation occurs, the data is copied from the host server to the other storage devices. These other storage devices may be situated locally or remotely, or may sometimes be provided as a combination of both. As multiple copies of the data exist, the data can be retrieved from at least one of those copies should a hardware failure occur. Typically, the data is mirrored onto physical devices (hard drives) although logical drives may also be used. Moreover, replication may be implemented as microcode on a disk array controller or as software running on a server. FIG. 1 shows a simple illustration of a prior art mirroring arrangement.
When this process is performed over a relatively short geographical distance, the term ‘mirroring’ may often be used. However, the term ‘storage replication’ is typically used when larger geographical distances are involved. Various replication techniques are known.
Synchronous Storage Replication
Synchronous storage replication is a known data replication technique where identical copies of the data are stored on separate storage devices in communication with the host server. When performing a write operation, the server needs to know when the data has been copied to each and every storage device. Thus, each storage device sends a receipt when it has received and stored the data. The write is only considered complete when it has been performed on, and acknowledged by, all the storage devices. If one of the storage devices fails to acknowledge completion of the write operation, then the overall write operation is deemed not to have been completed.
The advantage of this approach is that high availability is possible. If one copy of the data becomes unavailable to the host server, the host server can instantly fail over and use another copy of the data, in the knowledge that the copy it is accessing contains data exactly as expected; no consistency checking of the data is necessary.
However, as applications running on the server may wait for a write operation to complete before proceeding with other operations, the overall performance of the system can decrease considerably if it takes some time for the acknowledgement to be received by the server. This latency problem increases over large geographical distances, and so synchronous replication is only really practical over smaller distances.
Asynchronous Storage Replication
Asynchronous storage replication is a known data replication technique where separate storage devices are used to store copies of the data. Although all storage devices are updated when a write operation is requested by an application, the write operation is considered complete as soon as (only) one designated storage device acknowledges it. Whilst long-distance performance is greatly increased in comparison to the synchronous approach, if the designated storage device fails then the other storage device(s) are not guaranteed to store the current copy of data. Thus, whilst synchronous mirroring usually achieves a Recovery Point Objective (RPO) of zero lost data, with asynchronous writing the most recent updates to the data may be lost and the application data stored may not be self consistent. Thus, there is a problem of ‘crash-consistency’ which typically necessitates data consistency checking and repair before the copy is usable.
Point-in-Time Replication
Point-in-time replication is a known data replication technique where snapshots of the data are taken periodically. A read-only copy of the data is taken at a particular point in time. Once the initial copy has been created, subsequent snapshots need only copy the updates (i.e. changes) which are made to the data set held on the storage device, allowing applications to continue writing data to the local storage device whilst the snapshots are being taken. This has the advantage that the snapshots can be taken at such times when applications have been quiesced, memory caches have been flushed and the copied data is guaranteed to be self-consistent.
When an application wants to perform a write operation on a block (or several blocks) of data on the local disk, a snapshot is taken of the relevant portion of data before the change is made. The pre-write data is copied into the snapshot and then the write operation is performed, updating the original data volume. This is known as the ‘copy-on-write’ approach to snapshots. The replica on the remote storage device can then be updated using the copied blocks of data which have been stored in the snapshot. The update of the replica data set can be performed periodically (for example, every half an hour).
By copying the soon-to-be-changed blocks of data to a snapshot on another storage device, an historical record of the data can be maintained. Should the local disk then fail, preventing access to the original data volume, the data can be retrieved from the updated replica on the remote device.
A snapshot is typically implemented using an empty data store and a system of pointers to reference the replica. Advantageously, as only the changed data is copied during replication, rather than the entire contents of the storage device, the replica can be maintained over smaller, less expensive lower bandwidth links than would be required for a synchronous mirror.
However, the snapshot of changes grows over time as more write operations are performed on the data. It is also known that in practice, organizations have a tendency to keep the snapshot data for an extended period of time, thus using up resources. These factors can cause the performance of replicated storage to degrade.