Computer cluster systems have individual computers called nodes which are connected via a communication network. The communication network allows them to establish a communication link or channel between two nodes. Often computer clusters also have a shared storage device which is connected to each of the nodes of the computer cluster. On those shared storage devices some data is stored which is used by more than one node in the cluster. To prevent data inconsistency, means for data transmission between the nodes and the shared devices are required. For example, if one node in the cluster writes data in a file on the shared storage device, a second node is not allowed to read that file until the first node has finished the writing process. In normal conditions the first node writing the data in the file of the shared device will tell the second node of the writing process, thereby preventing the second node from reading the now outdated file. This task is done via the working communication channel between the two nodes.
If one node in the computer cluster breaks down, it will normally stop using the shared device. Other nodes in the computer cluster can use the data on the shared device without the risk of data corruption. However, if the communication channel between two nodes breaks down such that the members of the cluster are still operating yet cannot communicate with each other, data corruption on the shared devices can occur. A breakdown of the communication channel is called a split-brain condition resulting in, say, two subclusters. In this case a node in one of the resulting subclusters might write data in the file on a shared storage device while a second node in the other resulting subcluster reads or writes the file at the same time. Thus, a breakdown of the communication channel might lead to uncoordinated activities on shared devices. Therefore, it is necessary to shut down one of the resulting subclusters completely.
A shut down process of a subcluster system is normally done by the nodes of a first subcluster sending shut down commands to the nodes of a second subcluster. However this can lead to the situation that a node of one subcluster is the target of multiple shut down requests that may cause panic and undesired crashes among the nodes receiving those requests. Furthermore, the members of the surviving subcluster might not be known prior to the beginning of the shut down attempts. This might lead to the situation that a non-optimal subcluster will survive, which is not able to handle all necessary applications running on the cluster system.