1. Field of the Invention
This invention relates to computer systems and, more particularly, to replication of application checkpoint data.
2. Description of the Related Art
In the data centers of many enterprises, computer servers may be organized as one or more clusters of multiple cooperating nodes, where each node of a given cluster includes a computer server, and where the nodes of a cluster cooperate with each other to perform the data accesses and computations desired. Such clustered environments may be implemented for a variety of reasons: for example, to increase application availability, to increase the total computing capacity or storage capacity that may be utilized for a given application, and/or to support centralized management of the servers. In particular, clusters may be used to implement various types of application recovery mechanisms, including several types of failover mechanisms.
Typically, cluster-based failover mechanisms may allow an application (or a component of a distributed application) that normally executes on a first node (which may be called a primary node) of the cluster to be started at a different node (which may be called a secondary node) in the event of a failure at the first node. In order to start the application at the secondary node, the state of the application may typically be recovered at the second node using any of a variety of techniques. From an application user's perspective, two different objectives may need to be considered with respect to application state recovery. The first objective may be termed a “recovery point” objective: that is, a point of time in the past up to which a consistent copy of application data must be recovered in the event of a service disruption or failure. The second objective may be termed a “recovery time” objective: that is, the amount of time elapsed before business functionality is restored after a service disruption. Users typically want to minimize recovery time and also have the recovery point be as close to the point of failure detection as possible.
In some traditional recovery mechanisms, the state of the application data may be stored, e.g., periodically and/or on demand, in shared persistent storage such as a disk device that is accessible from both the primary and the secondary node. When a failure at the primary node is detected, the secondary node may read the latest version of the application state from the shared persistent storage, and may start a failover version of the application using the application state information read in. For many classes of applications, however, storing application state to disk may be too expensive, e.g., in terms of the resources needed to save the state and/or the time taken to recover the state from disk. For example, the overhead of disk latency (e.g., seek and rotational latency) associated with updating application state on disk may have a significant negative impact on perceived application execution speed. Thus, such disk-based application state recovery may result in excessive application overhead, unacceptably large recovery times as well as unacceptable recovery points.
As a result, some cluster-based recovery solutions have implemented in-memory checkpoint replication techniques. The term checkpoint may refer to one or more regions of memory that include application state information needed for application failover. For further enhancements to availability, checkpoints may be replicated to multiple nodes, from which a particular node may be chosen for failover in the event of a failure at the primary node. In order to ensure consistency of the replicated versions of the checkpoints at different nodes, checkpoint replication may be performed atomically in some environments. Typical atomic replication protocols that rely on two-phase commit or other techniques that require multiple messages to be exchanged between primary and secondary nodes for each committed set of changes may not scale as the number of replicating nodes is increased. In such replication protocols, for example, the messaging traffic for the replication protocol and/or the processing required for the replication protocol may become a bottleneck that adversely affects both recovery time and recovery point objectives.