1. Field of the Invention
The present invention relates to quickly enabling one of several secondary nodes to provide the functionality of a failed primary node while maintaining data consistency on all nodes.
2. Description of the Related Art
Information drives business. A disaster affecting a data center can cause days or even weeks of unplanned downtime and data loss that could threaten an organization's productivity. For businesses that increasingly depend on data and information for their day-to-day operations, this unplanned downtime can also hurt their reputations and bottom lines. Businesses are becoming increasingly aware of these costs and are taking measures to plan for and recover from disasters.
Two areas of concern when a failure occurs, as well as during the subsequent recovery, are preventing data loss and maintaining data consistency between primary and secondary storage areas. One strategy includes replicating data from local computer systems to backup local computer systems and/or to computer systems at remote sites. Because disk storage volumes are common types of storage areas that are replicated, the term “storage area” is used interchangeably with “storage volume;” however, one of skill in the art will recognize that the replication processes described herein are also applicable to other types of storage areas and that the use of the term “storage volume” is not intended to be limiting.
Storage area replication is used to maintain online duplicate copies of some storage areas, such as disk volumes. The original storage area is called the primary, and the duplicate is called the replica. Replication tries to ensure that the secondary volume contains the same data, block by block, as in the primary volume, while the primary volume is in active use. The replica server and primary server may communicate over a network channel.
To accommodate the variety of business needs, some replication facilities provide remote mirroring of data and replicating data over a wide area or distributed network such as the Internet. However, different types of storage typically require different replication methods. Replication facilities are available for a variety of storage solutions, such as database replication products and file system replication products, although typically a different replication facility is required for each type of storage solution. Other replication facilities are available for replicating all contents of a particular type of storage device.
In case of failure of a server maintaining the primary storage area, applications using the primary storage area can be moved to a replica server under control of external “failover” software; this process is also referred to as a “failover.” Preferably, failover is performed as quickly as possible to ensure high availability of enterprise data and continued functionality of enterprise systems.
Replication facilities provide such functionality as enabling a primary and secondary node to reverse roles when both are functioning properly. Reversing roles involves such replication operations as stopping the application controlling the replicated data, demoting the primary node to a secondary node, promoting the original secondary node to a primary node, and re-starting the application at the new primary node. Another example of functionality of a replication facility involves determining when a primary node is down, promoting the secondary node to a primary node, enabling transaction logging and starting the application that controls the replicated data on the new primary node. In addition, when the former primary node recovers from failure, the replication facility can prevent the application from starting at the former primary node since the application is already running at the newly-promoted node, the former secondary node. The transaction log can be used to synchronize data at the former and new primary nodes.
Replication of data can be performed synchronously, asynchronously, or periodically. With synchronous replication, an update is posted to the secondary node and acknowledged to the primary node before notifying the initiating application at the primary node that the update is complete. In the event of a disaster at the primary node, data can be recovered from the secondary node without any loss of data because the copies of the data at the primary and secondary nodes contain the same data. However, synchronous replication of data can be unacceptably slow in many enterprises with large amounts of data, very busy networks, and/or networks with high communication overhead due to long distances between nodes.
With asynchronous replication, updates to data are immediately reflected at the primary node and are queued to be forwarded to each secondary node. The initiating application is notified that the update is complete when the update is written to a storage location at the primary node. Data at the secondary node differs from data at the primary node during the period of time in which a change to the data is being transferred from the primary node to the secondary node. A decision regarding whether to replicate data synchronously or asynchronously depends upon the nature of the application program using the data as well as numerous other factors, such as available bandwidth, network round-trip time, the number of participating servers, and the amount of data to be replicated.
Under normal circumstances, updates, also referred to herein as writes, are sent to the secondary node in the order in which they are generated at the primary node when replication is performed asynchronously or synchronously. Consequently, the secondary node represents a state of the primary node at a given point in time. If a secondary node takes over due to a disaster, the data storage areas on the secondary nodes can be synchronized.
A replica that faithfully mirrors the primary currently is said to be synchronized or “in sync;” otherwise, the replica is said to be unsynchronized, or “out of sync.” An out of sync replica may be synchronized by selectively or completely copying certain blocks from the primary; this process is called synchronization or resynchronization.
Whether synchronous or asynchronous replication is used, volume replication software can begin to work only after an initial set-up phase where the replica is synchronized with the primary volume. A volume replication facility is set up to prepare a replica of a primary storage volume. Another storage volume, of the same capacity as the primary storage volume, is configured on a separate server. Data are copied from the primary storage volume to the replica storage volume via a communication network between the primary and replication server. Initial synchronization of two storage areas can be a time consuming process, especially for large volumes or slow networks.
After initial replica synchronization, a subsequent write operation being performed on the primary volume can be copied by the replication facility while the subsequent write operation is being performed. A copy of the data being written is sent over the network to be written to the replica volume. This process keeps the primary and the replica volume synchronized as closely as possible. However, problems such as network connectivity failure or host failure may cause the replica volume to become unsynchronized. In such a case, the primary volume and replica volume must be resynchronized.
In some business-critical environments, multiple replicas are maintained on multiple secondary nodes because of the need for high availability of the software and/or data. To ensure data consistency upon primary node failure, data on the secondary nodes must be synchronized before restarting an application managing the data. Because of the uncertainty of the state of the secondary nodes, often one of the secondary nodes is selected as having the data to be used to synchronize the other nodes, and data are copied from the selected secondary node to the other secondary nodes. Unfortunately, synchronization is often performed by copying all blocks of the selected data to the other secondary nodes. Only when the synchronization is complete can the failover process be completed.
What is needed is a solution that enables a secondary node to assume the role of a failed primary node with as little effect on performance as possible. The solution should enable data on multiple secondary nodes to be quickly synchronized across a network or locally so that functionality and data provided by the failed primary node can be resumed as quickly as possible.