1. Field of the Invention
The present invention relates generally to computer node clusters, and more particularly to determining weak membership in a set of computer nodes.
2. Description of the Related Art
In cluster-based systems, such as cluster file systems, cluster-based block servers, cluster communication packages, cluster storage systems, fault-tolerant cluster-based packages, etc., it is important that a unified view of the cluster membership is presented to users and/or nodes in the cluster. For a cluster with no faults or absent nodes, this is simple—the entire cluster is presented as the membership. But cluster-based systems can experience link and node failures, in which case the cluster membership, and more particularly the membership in that subset of nodes in which every node is logically interconnected with every other node in the subset, is something less than the entire cluster. The failures and concomitant difficulty in knowing membership becomes more problematic with the advent of redundant storage access.
Knowing the maximum subset of nodes that are logically interconnected with each other requires solving what is referred to as the “weak membership” problem. The problem can be described as follows. Given a set A of interconnected nodes that can potentially be a part of a cluster membership, the maximum subset B must be found in which all the nodes in B can communicate with each other.
In previous cluster systems, a “boss” node is generally used to determine the subset membership, which then makes the membership known to all affected nodes. Unfortunately, in a “boss” scheme each node must have two code paths, one that is used when the node is the “boss” node and one that is used when the node is a subordinate node. Relatively much data must be transmitted through such a system, since the “boss” node must inform each subordinate node of the entire weak membership, node by node. Additionally, “boss” schemes require code to deal with exceptional circumstances, such as a re-elect mechanism to address the failure of the boss node and resolution mechanisms to account for multiple prospective boss nodes. These requirements complicate implementation and coding of “boss” node regimes and limit their scalability to smaller clusters, since a single “boss” node can encounter difficulty processing and distributing changes to all members of the cluster.
Even non-boss based methods that involve synchronized broadcast of membership changes can overload individual nodes with the processing of a flood of membership change messages. Furthermore, many clusters do not support the broadcast or multicast requirements imposed by such systems. Still further, asymmetric failures, that is, failures that occur when a node is connected to a given set of nodes but some nodes inside the set are not connected to all the nodes in the set, has generally not been accounted for in previous systems. This is because asymmetric failures are not likely in the context of clusters connected through a single network, but become more common in redundant networks, the possibility of which has not always been considered by prior methods.
With the above considerations in mind, the present invention critically recognizes the need to solve the weak membership problem in clustered systems in a way that is scalable, that accounts for redundant networks, and that does not require a cluster to support broadcasting or multicasting or to bear relatively high message traffic to support the solution. Accordingly, the present invention provides the solutions disclosed herein to one or more of the above considerations.