The present invention relates to multiple computers connected together in a cluster configuration, and more specifically, to a method, system and computer program product for handling a split condition within a computer cluster configuration.
In the field of computer processing, it is known to connect together a plurality of computers in a cluster having a certain configuration or topology. Each computer within a cluster is typically referred to as a node. This cluster configuration is utilized in part to divide software processing tasks among the computers in the cluster, which leads to improvements in efficiency in completing the oftentimes complex software processing tasks.
A common cluster configuration or topology is a symmetric one in which the various nodes are all connected to each other and to other devices such as, for example, a data storage device or repository. In addition, for redundancy purposes, the nodes may be connected together using more than one connection scheme, including using different types of wired or wireless mediums or protocols such as, for example, Ethernet, TCP/IP, TCP, a storage area network (SAN), a local area network (LAN), a wide area network (WAN), a data information service center (DISK.), or a direct connection.
Nodes within a cluster commonly use “heartbeats” to communicate with each other on a regular basis (e.g., twice per second). This allows the node sending the heartbeat signal to determine if one or more receiving nodes, including the communication interfaces of the nodes and the communication medium(s) or protocol(s) between the nodes, are functioning properly. Often, a “gossip” heartbeat may be communicated which includes not only information about the sending or transmitting node (e.g., that it is active), but also includes information that the sending node has received from other nodes indicating, for example, which of the other nodes are available and the topology sensed by each of the other nodes, i.e., which of the other nodes each other node thinks are available.
Although transmitting heartbeats over multiple interfaces may improve reliability, a partial loss of connectivity between one or more nodes to other nodes within the cluster may cause asymmetric topological views among the nodes, i.e., different nodes may have different views of which other nodes are connected and functioning. Asymmetric topologies may lead to cluster inoperability issues. For example, cluster-wide locks may be erroneously granted, thereby leading to repository corruption and confusion among upper network layers.