The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Computers may work together in a group in many contexts. For example, two or more database servers executing on separate computers may work as a group in responding to requests to read from or write to a persistent storage mechanism, such as a database. Those in the art may refer to a certain group of computers working together as a logical unit as a “cluster.”
In a cluster of computers (or “nodes”), each node of the cluster may issue a request to write data (a “write request”) to a persistent storage mechanism. To ensure the accuracy of the data stored in the persistent storage mechanism, one or more nodes of the cluster may occasionally need to be prevented from performing write requests on the persistent storage mechanism. For example, when a network split occurs between portions of a cluster (the “split-brain problem”), a portion of the cluster may need to be prevented from performing write requests on the persistent storage mechanism to ensure one node does not write over changes made to the persistent storage mechanism by another node. Preventing a node from performing a write request on a persistent storage mechanism is called “fencing” the node.
Current approaches for performing fencing involve instructing the node(s) to be fenced to power down. Once the fenced node(s) have powered down, the other nodes of the cluster may continue with the assurance that the fenced node(s) will not issue any further write requests to the persistent storage mechanism.
To illustrate how fencing might be employed, assume that several nodes of a cluster (referred to as “the first cohort of nodes”) are located at a first location and the other nodes of the cluster (referred to as “the second cohort of nodes”) are located at a different location than the first location. Geographically separating the first cohort of nodes from the second cohort of nodes is advantageous because if a problem (such as a fire or a power outage) disrupts operation at the first location, then the second cohort of nodes (which is located at a different location than where the problem occurred) may continue to operate.
Each node of the cluster (i.e., each node in both the first cohort of nodes and the second cohort of nodes) may issue read requests and write requests to a persistent storage mechanism. To ensure the accuracy of the data within the persistent storage mechanism, only nodes of the cluster should be able to perform write requests on the persistent storage mechanism. However, there are occasions when one or more nodes of the cluster may become inoperable (for example, due to a network problem or an unexpected hardware problem occurring at a node), and therefore, may lose membership in the cluster. As a result, the node that lost membership in the cluster is instructed to power down. Once the node has powered down, the cluster may be assured that no further write requests, which if processed may corrupt the data stored in the persistent storage mechanism, will be issued by the node that lost membership to the cluster.
As another example, if a network connection between the first cohort of nodes and the second cohort of nodes becomes inoperable, then it would be desirable to prevent either the first cohort of nodes or the second cohort of nodes from performing write requests on the persistent storage mechanism to avoid either the first cohort of nodes or the second cohort of nodes writing over changes made by the other. In such a case, one of the first cohort of nodes and the second cohort of nodes would be fenced by instructing that cohort to power down, (thereby preventing nodes of that cohort from performing write operations on the persistent storage mechanism), and the other cohort of nodes would be allowed to operate as normal.
If a write request issued by a node is in transit over the network when the node is fenced, the write request may still be received by the persistent storage mechanism. In fact, the write request may be received after a point in time when the cluster considers it safe to resume normal operation. Consequently, the possibility exists that the data stored in the persistent storage mechanism may still become corrupted. Further, as the nodes of a cluster become more distant and separated, the likelihood of this scenario increases as write requests may spend a greater amount of time traversing the network from the sender to the persistent storage mechanism. Also, a malicious node might not power down when instructed to do so, and as a result, the node may continue to issue write requests to the persistent storage mechanism.
Current approaches for performing fencing operations also having difficulty scaling to support large clusters. In a typical enterprise system, many applications executing on different nodes need to collaborate with their peers on other nodes of the cluster. Depending on the nature of the collaboration, an application may need to either interact with all nodes of the cluster or just a subset of the nodes of the cluster. As a result, the interaction between each node of the cluster based on the needs of the application executing on a node of the cluster must be managed, either by each application itself or a centralized entity for the cluster. Managing this interaction requires an undesirable amount of resources.
Thus, an improved mechanism for performing fencing is desirable.