1. Field of the Invention
The present invention relates to computer systems and, more particularly, to improved methods and apparatus for maintaining full connectivity in clustered computer systems.
2. Description of the Related Art
In contrast to single mainframe computing models of the past, more distributed computing models have recently evolved. One such distributed computing model is known as a clustered computing system. FIG. 1 illustrates an exemplary clustered computing system 100 including computing nodes (nodes) A, B and C, storage devices (e.g., storage disks 102-104), and other devices 106-110 representing other devices such as scanners, printers, digital cameras, etc. For example, each of the nodes A, B and C can be a computer with its own processor and memory. The collection of nodes A, B and C, storage disks 102-104, and other devices 106-110 make up the clustered computing system 100.
Typically, the nodes in a cluster are coupled together through a xe2x80x9cprivatexe2x80x9d interconnect with redundant pathways. As shown in FIG. 1, nodes A, B and C are coupled together through private communication channels 112 and 114. For example, the private communication channels 112 and 114 can adhere to Ethernet, ATM, or Scalable Coherent Interface (SCI) standards. A client 116 can communicate with the clustered computing system 100 via a network 118 (e.g., public network) using a variety of protocols such as Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc. From the point of view of the client 116, the clustered computing system 100 is a single entity that can provide the client 116 with a variety of computer-implemented services, e.g., web-hosting, transaction processing, etc. In other words, the client 116 is not aware of which particular node(s) of the clustered computing system 100 is (are) providing it services.
The clustered computing system 100 provides a scalable and cost-efficient model where off-the-shelf computers can be used as nodes. The nodes in the clustered computing system 100 cooperate with each other to provide a distributed computing model that is transparent to users, e.g., the client 116. In addition, in comparison with single mainframe computing models, the clustered computing system 100 provides improved fault tolerance. For example, in case of a node failure within the clustered computing system 100, other nodes can take over to perform the services normally performed by the node that has failed.
Typically, nodes in the clustered computing system 100 send each other xe2x80x9cresponsivexe2x80x9d (often referred to as xe2x80x9cheart beatxe2x80x9d or activation) signals over the private communication channels 112 and 114. The responsive signals indicate whether nodes are active and responsive to other nodes in the clustered computing system 100. Accordingly, these responsive signals are periodically sent by each of the nodes so that if a node does not receive the responsive signal from another node within a certain amount a time, a node failure can be suspected. For example, in the clustered computing system 100, if nodes A and B do not receive a signal from node C within an allotted amount of time, nodes A and B can suspect that node C has failed. In this case, if nodes A and B are still responsive to each other, a two-node sub-cluster (AB) results. From the perspective of the sub-cluster (AB), node C can be referred to as a xe2x80x9cnon-responsivexe2x80x9d node. If node C has really failed then it would be desirable for the two-node sub-cluster (AB) to take over services from node C. However, if node C has not really failed, taking over the services performed by node C could have dire consequences. For example, if node C is performing write operations to the disk 104 and node B takes over is the same write operations while node C is still operational, data corruption can result.
It should be noted that the fact that nodes A and B have not received responsive signals from node C does not necessarily mean that node C is not operational with respect to the services that are provided by node C. Other events can account for why responsive signals for node C have not been received by nodes A and B. For example, the private communication channels 112 and 114 may have failed. It is also possible that node C""s program for sending responsive signals may have failed but node C is fully operational with respect to the services that it provides. Thus, it is possible for the clustered computing system 100 to get divided into two or more functional sub-clusters wherein the sub-clusters are not responsive to each other. This situation can be referred to as a xe2x80x9cpartition in spacexe2x80x9d or xe2x80x9csplit brainxe2x80x9d where the cluster no longer behaves as a single cohesive entity. In this and other situations, when the clustered computing system no longer behaves as a single cohesive entity, it can be said that the xe2x80x9cintegrityxe2x80x9d of the system has been compromised.
In addition to partitions in space, there are other potential problems that need to be addressed in managing the operation of clustered computing systems. For example, another potential problem associated with operating clustered computing systems is referred to as a xe2x80x9cpartition in timexe2x80x9d or xe2x80x9camnesia.xe2x80x9d As is known to those skilled in the art, partitions in time can occur when a clustered computing system is operated with cluster configurations that vary over time.
Another problem that can affect clustered computing systems is loss of full connectivity. It is common that nodes of a clustered computing system be connected to every other node in the clustered computing system. Some software that is run on clustered computing systems even assumes, and thus requires, that the clustered computing systems have full connectivity. Hence, problems result when such clustered computing systems lose full connectivity. The loss of full connectivity means that the clustered computing system has incomplete (or partial) connectivity. Normally, the incomplete connectivity is caused by failure of an interconnect that couples nodes together. The loss of full connectivity can cause software to crash or xe2x80x9changxe2x80x9d. Accordingly, conventional approaches, such as described in U.S. Pat. No. 6,002,851, maintain full connectivity through use of sophisticated, centralized approaches that attempt to determine which nodes to shut down so that the remaining active nodes of the clustered computing system are fully connected. The disadvantage of this conventional approach is that in order to obtain an optimal solution it is overly complex. As a result of the complexity, the software implementing the optimal solution is complex and lengthy and thus prone to xe2x80x9cbugsxe2x80x9d (i.e., defects).
In view of the foregoing, there is a need for improved techniques to maintain full connectivity in clustered computer systems.
Broadly speaking, the invention pertains to techniques for maintaining full connectivity in a clustered computing system. The improved techniques allow for detection of one or more disconnections that cause a loss of full connectivity and then resolution of the disconnections by shutting down one or more appropriate nodes of the clustered computing system to regain full connectivity. As a result, the clustered computing system can effectively maintain full connectivity as is often required by software running on the nodes of the clustered computing system.
The invention can be implemented in numerous ways, including a method, a system, an apparatus, or a computer readable medium. Several embodiments of the invention are discussed below.
As a method for monitoring full connectivity in a clustered computing system having more than two nodes, one embodiment of the invention includes the acts of: detecting loss of full connectivity in the clustered computing system; determining, at each of the nodes, which one or more of the nodes of the clustered computing system should be shut down to regain full connectivity in the clustered computing system; and shutting down the one or more nodes of the clustered computing system that the determining has determined should be shut down to regain full connectivity in the clustered computing system.
As a method for maintaining full connectivity in a clustered computing system having more than two nodes, one embodiment of the invention includes the acts of: detecting loss of full connectivity in the clustered computing system; determining which one or more of the nodes of the clustered computing system should be shut down to regain full connectivity in the clustered computing system based on at least one of reboot status of the nodes and votes associated with the nodes; and shutting down the one or more nodes of the clustered computing system that the determining has determined should be shut down to regain full connectivity in the clustered computing system.
As a clustered computing system, one embodiment of the invention includes a computing cluster having at least three computing nodes, and a connectivity monitoring manager provided within each of the computing nodes. The connectivity monitoring manager operates to detect loss of full connectivity in the clustered computing system. Then, when loss of full connecting has been detected, the connectivity monitoring manager operates to determine which one or more of the nodes of the clustered computing system should be shut down to regain full connectivity in the clustered computing system based on at least one of reboot status of the nodes and votes associated with the nodes.
As a clustered computing system having at least three nodes, the clustered computing system includes: detector configured to detect loss of full connectivity in the clustered computing system; determinator configured to determine, at each of the nodes, which one or more of the nodes of the clustered computing system should be shut down to regain full connectivity in the clustered computing system; and shutdown controller configured to shutting down the one or more nodes of the clustered computing system that the determinator has determined should be shut down to regain full connectivity in the clustered computing system.
As a computer readable medium including computer program code for monitoring full connectivity in a clustered computing system having more than two nodes, the computer readable medium includes: computer program code for detecting loss of full connectivity in the clustered computing system; computer program code for determining, at each of the nodes, which one or more of the nodes of the clustered computing system should be shut down to regain full connectivity in the clustered computing system; and computer program code for shutting down the one or more nodes of the clustered computing system that the computer program code for determining has determined should be shut down to regain full connectivity in the clustered computing system.
The invention has numerous advantages. One advantage of the invention is that it provides a simplified approach to maintaining full connectivity within a clustered computing system. The simplified approach is statistically significantly less likely to have xe2x80x9cbugsxe2x80x9d (i.e., defects) in the implementing software. Another advantage of the invention is that determination of the particular nodes to shut down can be rapidly performed so as to maintain full connectivity. Still another advantage of the invention is that the processing is locally performed at each of the active nodes.