As is known in the art, a computer network cluster is a collection of interconnected computers which share resources such as data storage. The individual computers, or nodes, are connected through both a physical and a software-level interconnect. The independent nodes are integrated into a single virtual computer, appearing to an end user as a single computing resource. If one node fails, the remaining nodes will handle the load previously handled by the failed node. This multiple computer environment provides many benefits to a user including high availability and increased speed of operation.
A typical network cluster configuration includes a plurality of nodes typically sharing one or more storage devices. The nodes are connected to each other by a high speed network connection such as ethernet.
A user can connect into the network cluster through any of the nodes in the network cluster. From the perspective of a user, the network cluster appears as a single computer system. Software applications run by a user are executed using the shared storage devices. An exemplary software application often executed on a computer network cluster is a database application. Typically, the database is stored on one or more shared storage devices. Inquiries or changes to the database are initiated by a user through any one of the cluster member nodes.
Successful operation of a network cluster requires coordination among the nodes with respect to usage of the shared resources as well as with respect to the communication between the nodes. Specifically, with multiple users manipulating shared data, precautions must be taken in a network cluster to insure the data is not corrupted. In addition, instances of nodes joining and exiting the network cluster must also be coordinated to avoid a loss of system integrity. Multiple safeguards have been instituted to aid in the prevention of a loss of system integrity.
One such safeguard may be instituted by the network cluster to handle cluster partitioning. Cluster partitioning results when the cluster network degenerates into multiple cluster partitions including a subset of the cluster network nodes, each cluster partition operating independently of each other. These partitions may be the result of one cluster partition having lost network connection with the remaining cluster partions, the so-called partition-in-space problem.
To resolve the partition-in-space problem, which can lead to corruption of share data, a concept referred to as quorum is typically instituted. Quorum refers to a minimum number of nodes required to initiate or continue operation of a network cluster. In an N node cluster, N representing the maximum number of nodes allowed membership in a given cluster, quorum is given as (N+1)/2. That is, more than half of the total number of nodes must be available for the cluster to continue functioning. Therefore, in a four node cluster, a minimum of three nodes must be available to initiate or continue operation of the cluster. By requiring more than half of the nodes to be in operation at a time, only a single cluster partition meeting such a requirement can exist. As a result, the cluster partition including a minority of nodes is forced to cease operation thus preserving the integrity of the shared data.