1. Field of the Invention
The present invention relates to replicating data for backup and disaster recovery purposes and, in particular, to synchronizing replicas of data stored in different storage areas.
2. Description of the Related Art
Information drives business. A disaster affecting a data center can cause days or even weeks of unplanned downtime and data loss that could threaten an organization's productivity. For businesses that increasingly depend on data and information for their day-to-day operations, this unplanned downtime can also hurt their reputations and bottom lines. Businesses are becoming increasingly aware of these costs and are taking measures to plan for and recover from disasters.
Two areas of concern when a failure occurs, as well as during the subsequent recovery, are preventing data loss and maintaining data consistency between primary and secondary storage areas. One strategy includes replicating data from local computer systems to backup local computer systems and/or to computer systems at remote sites. Because disk storage volumes are common types of storage areas that are replicated, the term “storage area” is used interchangeably with “storage volume;” however, one of skill in the art will recognize that the replication processes described herein are also applicable to other types of storage areas and that the use of the term “storage volume” is not intended to be limiting. Furthermore, the unit of storage in a given storage area is referred to herein as a “block,” as block terminology is typically used to describe units of storage of storage volumes. Again, one of skill in the art will recognize that the unit of storage can vary according to the type of storage area, and may be specified in units of bytes, ranges of bytes, files, or other types of storage objects. The use of the term “block” herein is not intended to be limiting and is used herein to refer generally to any type of storage object.
Some types of storage areas, such as a storage volume, store data as a set of blocks. Each block is typically of a fixed size; a block size of 512 bytes is commonly used. Thus, a volume of 1000 Megabyte capacity contains 2,048,000 blocks of 512 bytes each. Any of these blocks can be read from or written to by specifying the block number (also called the block address). Typically, a block must be read or written as a whole.
Storage area replication is used to maintain online duplicate copies of some storage areas, such as disk volumes. The original storage area is called the primary, and the duplicate is called the replica. Replication tries to ensure that the secondary volume contains the same data, block by block, as in the primary volume, while the primary volume is in active use.
In case of failure of a server maintaining the primary storage area, applications using the primary storage area can be moved to a replica server under control of external fail over software; this process is also referred to as a “failover.” The replica server and primary server may communicate over a network channel.
To accommodate the variety of business needs, some replication facilities provide remote mirroring of data and replicating data over a wide area or distributed network such as the Internet. However, different types of storage typically require different replication methods. Replication facilities are available for a variety of storage solutions, such as database replication products and file system replication products, although typically a different replication facility is required for each type of storage solution. Other replication facilities are available for replicating all contents of a particular type of storage device.
Replication facilities provide such functionality as enabling a primary and secondary node to reverse roles when both are functioning properly. Reversing roles involves such replication operations as stopping the application controlling the replicated data, demoting the primary node to a secondary node, promoting the original secondary node to a primary node, and re-starting the application at the new primary node. Another example of functionality of a replication facility involves determining when a primary node is down, promoting the secondary node to a primary node, enabling transaction logging and starting the application that controls the replicated data on the new primary node. In addition, when the former primary node recovers from failure, the replication facility can prevent the application from starting at the former primary node since the application is already running at the newly-promoted node, the former secondary node. The transaction log can be used to synchronize data at the former and new primary nodes.
Replication of data can be performed synchronously or asynchronously. With synchronous replication, an update is posted to the secondary node and acknowledged to the primary node before completing the update at the primary node. In the event of a disaster at the primary node, data can be recovered from the secondary node without any loss of data because the copies of the data at the primary and secondary nodes contain the same data. With asynchronous replication, updates to data are immediately reflected at the primary node and are queued to be forwarded to each secondary node. Data at the secondary node differs from data at the primary node during the period of time in which a change to the data is being transferred from the primary node to the secondary node, as explained in further detail below. The magnitude of the difference can increase with the transfer time, for example, as update activity increases in intensity. A decision regarding whether to replicate data synchronously or asynchronously depends upon the nature of the application program using the data as well as numerous other factors, such as available bandwidth, network round-trip time, the number of participating servers, and the amount of data to be replicated.
Under normal circumstances, updates, also referred to herein as writes, are sent to the secondary node in the order in which they are generated at the primary node. Consequently, the secondary node represents a state of the primary node at a given point in time. If the secondary node takes over due to a disaster, the data storage areas will be consistent.
A replica that faithfully mirrors the primary currently is said to be synchronized or “in sync;” otherwise, the replica is said to be unsynchronized, or “out of sync.” An out of sync replica may be synchronized by selectively or completely copying certain blocks from the primary; this process is called synchronization or resynchronization.
Whether synchronous or asynchronous replication is used, volume replication software can begin to work only after an initial set-up phase where the replica is synchronized with the primary volume. This process is called initial replica synchronization. A volume replication facility is set up to prepare a replica of a primary storage volume. Another storage volume, of the same capacity as the primary storage volume, is configured on a separate server. Data are copied from the primary storage volume to the replica storage volume via a communication network between the primary and replication server. Initial synchronization of two storage areas can be a time consuming process, especially for large volumes or slow networks. The following methods of initial replica synchronization are known:                In offline synchronization, a disk-level backup is performed; the backup storage media, such as tape or CD, are manually taken to a replica server or transferred over a network to the replica server using a file transfer protocol or other similar protocol; and data are restored to a storage volume for the replica.        In bulk synchronization, the entire storage area is copied block by block over a network to a replication site using replication software.        
After initial replica synchronization, a subsequent write operation being performed on the primary volume is trapped by the replication facility. A copy of the data being written is sent over the network to be written to the replica volume. This process keeps the primary and the replica volume synchronized as closely as possible. However, problems such as network connectivity failure or host failure may cause the replica volume to become unsynchronized. In such a case, the primary volume and replica volume must be resynchronized.
In one resynchronization process known as “smart synchronization,” each block of primary storage is read, a checksum is computed from the data, and the checksum is sent across the network to a replica server. The replica server compares the received checksum against a local checksum computed from a replica of the data. If the checksums do not match, only then are data replicated from the primary to the replica server. This technique is similar to what is used by the open-source file replication utility called “rsync.”
However, none of the methods described above use information that is available to application programs managing the data being copied that are running in conjunction with the storage area replication software. In fact, not every block of a volume contains useful data. The application that uses the volume (such as a file system or database) generally has free blocks in which contents are irrelevant and usually inaccessible. Such blocks need not be copied during synchronization.
What is needed is a solution that enables initial synchronization as well as resynchronization to be performed with as little effect on performance as possible. The solution should avoid replicating unnecessary information and enable data to be quickly synchronized across a network or locally.