1. Field of the Invention
The present invention generally relates to utilizing underlying cluster systems to provide the functionality of reviving and reconstituting majority node set clusters. The methods of the present invention provide ways to have the cluster continue even if the number of nodes falls below what is designated as a majority. It then returns the cluster back to normal operation once the number of nodes available increases to the correct level, which has been designated in advance.
2. Description of Related Art
The present invention generally relates to utilizing underlying cluster systems to provide the functionality of reviving and reconstituting majority node set clusters. The methods of the present invention provide ways to have the cluster continue even if the number of nodes falls below what is designated as a majority. It then returns the cluster back to normal operation once the number of nodes available increases to the correct level, which has been designated in advance.
Today, in a traditional Microsoft Cluster Service (MSCS) cluster, a cluster can continue as long as one of the nodes owns the quorum disk. The Microsoft Cluster Service (MSCS) is software for the creation and management of a cluster. A cluster is a group of servers and other resources, in a computer system, that act like a single system and enable high availability and, in some cases, load balancing and parallel processing. Any nodes that can communicate (via heartbeats) with the quorum owner are part of the cluster and can host resources. Heartbeats in a cluster service are messages sent regularly by the Cluster Service on one node to the Cluster Service on another node across a private network connection. Any other nodes that are configured to be in the cluster, but cannot communicate with the quorum owner are said to have lost quorum and thus any resources that they are hosting, are terminated. A node is a computer system that is a member of a cluster. A traditional Microsoft Cluster Service (MSCS) cluster can continue as long as that node owns the quorum disk. A cluster running with a majority node set quorum resource will only start up or continue running if a majority of the nodes configured for the cluster are up and running and can all communicate with each other. The majority node set is a single quorum resource from an MSCS perspective. This majority node set resource takes care to ensure that the cluster configuration data stored on the majority node set is kept consistent across the different disks. The failure semantics of the cluster behaves on node failures and partitioned or split-brain scenarios, thus, care must be taken when deciding whether to choose a traditional MSCS cluster using a physical disk resource or cluster that uses a majority node set as a quorum resource.
There is a single quorum of nodes resource in the cluster and it is brought online on one node at any one time, just like any other cluster resource. A quorum is a disk resource used to store information about the current cluster configuration. The majority node set resource is responsible for ensuring that the quorum data is kept consistent on all disks around the cluster. When a cluster is setup to have a majority node set, a file share is created on that node. As described above, each node in a majority node set cluster has a file share that exports the quorum directory, so that, regardless of where the majority node set resource is hosted, it can write to all the members of the majority node set.
It is, therefore, necessary to devise a method to revive and reconstitute majority node set clusters after a node failure. Therefore, this invention provides a mechanism to bring the cluster back online, while looking at the rest of the cluster nodes, and bring them back in.
One related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 6,115,830 entitled “Failure Recovery For Process Relationships In A Single System Image Environment”. This prior art method is a system for recovery of process relationships following node failure within a computer cluster. For relationship recovery, each node maintains set of care relationships. Each relationship is of the form the carer cares about the care target. Care relationships describe process relations such as parent-child or group leader-group member. Care relationships are stored at the origin node of their care targets. Following node failure, a surrogate origin node is selected. The surviving nodes then cooperate to rebuild vproc structures and care relationships for the processes that originated at the failed node at the surrogate origin node. The surviving nodes then determine which of their own care targets were terminated by the node failure. For each of the terminated care targets, notifications are sent to the appropriate carers. This allows surviving processes to correctly recover from severed process relationships.
The present invention differs from the above related cited art in that the prior invention focuses on detecting the lack of a majority of nodes to be within the cluster and reorganizes the cluster membership to allow for running with less than a majority of nodes connected. In the instant invention, however, as nodes become alive, they are added to the cluster until the original majority is reached and then the invention allows the cluster to return to normal operation.
Yet another related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 6,401,120 entitled “Method And System For Consistent Cluster Operational Data In A Server Cluster Using A Quorum Of Replicas”. This prior art method is a method and system for increasing the availability of a server cluster while reducing its cost by requiring at a minimum only one node and a quorum replica set of storage devices (replica members) to form and continue operating as a cluster. A plurality of replica members maintain the cluster operational data and are independent from any given node. A cluster may be formed and continue to operate as long as one server node possesses a quorum (majority) of the replica members. This ensures that a new or surviving cluster has at least one replica member that belonged to the immediately prior cluster and is thus correct with respect to the cluster operational data. Update sequence numbers and/or timestamps are used to determine the most updated replica member from among those in the quorum for reconciling the other replica members.
The present invention differs from this related art in that the cited related art deals with storing multiple copies of the quorum on external shared disks. Therefore, a cluster will be operational if at least one node is online and can access a majority of quorum copies. The difference here is that in the instant case, quorum data is stored locally on each nodes' internal drives. The present invention, however, relies on the underlying cluster service to maintain the data. This instant invention affects the cluster only when the number of member nodes falls below a majority. Once that occurs, the new invention allows for bringing the cluster together using the working nodes. The cluster itself will find the most up-to-date data.
Yet another related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 6,163,855 entitled “Method And System For Replicated And Consistent Modifications In A Server Cluster”. This prior art method is a method and system for communicating modification information to servers in a server cluster. Local changes, such as modifications to a resource requested at one node, are associated into a single transaction. A master node, such as the node that owns the set of resources corresponding to the modifications in the transaction requests permission from a locker node to replicate the transaction. When permission to replicate the transaction is received from the locker node, the master node replicates the transaction by requesting each node in the cluster, one node at a time, to commit the transaction. Any node that does not commit the transaction is removed from the cluster, ensuring consistency of the cluster. Failure conditions of any node or nodes are also handled in a manner that ensures consistency.
The present invention differs from this related art in that the cited related art only describes a system for ensuring that copies of the quorum on each node are kept up to date. The present invention builds on the cluster system similar to that described above. The present invention allows for the cluster to function even when it has less member nodes than it normally needs to run. The invention allows the cluster to return to normal operation when the number of available nodes returns to the expected level.
Yet another related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 6,336,171 entitled “Resource Protection In A Cluster Environment”. This prior art method is a method of protecting volumes of a mass storage device shared by a server cluster and includes the step of transferring from (i) a first file system of a first server of the server cluster to (ii) a first filter driver of the first server, (iii) a write request packet directed to a first volume of the mass storage device. Another step of the method includes determining at the first filter driver whether the first server has ownership of the first volume. Yet another step of the method includes transferring the write request packet from the first filter driver to a lower level driver for the mass storage device only if the determining step determines that the first server has ownership of the first volume. Apparatus for carrying out the method are also disclosed.
The present invention differs from this related art in that the cited related art only describes the methods used to operate a clustered environment. The method of the present invention utilizes an underlying cluster system to also provide this functionality. The method of the present invention also provides methods to have the cluster continue even if the number of nodes falls below a majority. Finally, the present invention also returns the cluster to normal operation once the number of nodes available increases to the correct level.