The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
A server cluster is a group of at least two independent servers connected by a network and managed as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be quickly restarted on a surviving server, with no substantial reduction in service. Indeed, clients of a Windows NT cluster believe they are connecting with a physical system, but are actually connecting to a service which may be provided by one of several systems. To this end, clients create a TCP/IP session with a service in the cluster using a known IP address. This address appears to the cluster software as a resource in the same group (i.e., a collection of resources managed as a single unit) as the application providing the service. In the event of a failure the cluster service xe2x80x9cmovesxe2x80x9d the entire group to another system.
Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Dynamic load balancing is also available. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
While clustering is thus desirable in many situations, problems arise if the servers (nodes) of the cluster become inconsistent with one another with respect to certain persistent cluster information. For example, memory state information, properties of the cluster or its resources and/or the state and existence of components in the cluster need to be consistent among the cluster""s nodes. A global update protocol is used to ensure consistency of updates to this persistent state. Moreover, if a cluster shuts down and a new cluster is later formed with no members common to the previous cluster, a situation known as a temporal partition, a potential problem exists because no new member necessarily possesses the current state information of the previous cluster.
To maintain consistency across a temporal partition, a log file is maintained. Each time a modification to the cluster state information takes place, the change is recorded in the log file. Then, when a new node forms a cluster, it unrolls any changes recorded in the log file to make its local database consistent with the last state of the previous cluster before it went down.
However, different nodes can fail at different times. In one particular event, a node may fail just after it has committed a change locally and caused the change to be logged in the log file, but before any other node can find out about the change. As soon as another state change occurs and is logged by a surviving node, the previous entry in the log file is not consistent with the state of the surviving cluster. If a new cluster is later formed following a temporal partition, the node forming the cluster will unroll this inconsistent information from the log file, whereby the new cluster will be inconsistent with the previous (earlier-in-time) cluster.
The present invention provides a method and system for discarding change information in a server cluster that is locally committed and logged, but not replicated to other nodes. Such change information is preferably maintained and replicated as a transaction. If a transaction is not fully replicated due to a server failure or the like, and a subsequent transaction is logged, the previous transaction (referred to herein as an orphaned replicated transaction) is inconsistent with the actual state of a surviving cluster. When unrolling a log file to make a new cluster consistent with a previous cluster across a partition in time, such orphaned (logged, but not fully replicated) transactions are discarded by the present invention, whereby the new cluster becomes consistent with the actual state of the previous cluster.
Briefly, the present invention provides a method and system for recording the state data of a previous cluster and forming a new cluster of servers using that state data such that the new cluster is consistent with the state of the previous cluster. Each transaction is recorded in a log file with an associated sequence number. A local copy of the sequence number is monotonically adjusted (e.g., incremented) by each node each time that a transaction is replicated thereto. If a transaction is logged but not replicated, the next logged transaction will have the same sequence number, since no other node received the transaction and thus did not increment its sequence number. A node forming a new cluster, such as after a temporal partition, retrieves each transaction from the log file along with its associated sequence number. While unrolling the log file, the first of any logged transactions having duplicate sequence numbers are known to have been orphaned replicated transactions. Such orphaned replicated transactions are discarded rather than used to update the state of the node forming the new cluster, whereby the new cluster becomes consistent with the actual state of the previous cluster.
Other benefits and advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: