1. Field of the Invention
The present invention relates to computer networks and clustered computing systems. More specifically, the present invention relates to a method and an apparatus for providing per-node addresses in a clustered computing system that can tolerate failures of communication pathways between the nodes in the clustered computing system.
2. Related Art
Clustered computing systems allow multiple computing nodes to work together in accomplishing a computational task. In a clustered computing system, a plurality of computing nodes are typically coupled together through one or more computing networks so that each node in the cluster is able to communicate with every other node.
Clustered computing systems are often designed to be fault-tolerant so that a clustered computing system can continue to function if individual components within the clustered computing system fail. One particular problem in providing fault-tolerance is to design a system that can tolerate failures on the communication pathways that link together the nodes of the clustered computing system. Such failures can occur, for example, in cables, in network interface cards (NICs) within the computing nodes, and within intermediate networking equipment, such as a hub or a switch.
In designing a fault-tolerant communication mechanism, it is desirable to use an industry standard communication protocol, such as the Transmission control Protocol Internet Protocol (TCP/IP), so that existing components, which make use of these industry standard protocols, can be used within the clustered computing system. More specifically, it is desirable for each machine in a cluster have its own IP address that can be used to contact the machine from any other machine in the cluster.
It is also desirable for the fault-tolerant communication mechanism to provide at least two disjoint physical communication pathways between each pair of nodes in the clustered computing system. In this way, if a single communication pathway fails, the system is able to provide an alternative communication pathway.
Providing a fault-tolerant TCP/IP network is relatively easy in the case where there are multiple redundant networks, and where all of the nodes in the computing system are attached to each of the redundant networks. In this case, fault-tolerance can be provided by assigning each machine its own IP address on a primary network. If a path fails within the primary network network, the system simply moves all of the IP addresses to an alternative functioning network.
Unfortunately, in many clustered computing systems the computing nodes are not all attached to all of the networks. For example, each machine may have a point-to-point connection to every other machine, or there may exist multiple hubs or switches that only connect to a subset of the nodes in the cluster. If this is the case, it is not possible to create a single IP network that spans all of the nodes in the cluster because of a limitation of TCP/IP. In TCP/IP, a unique IP number can only be hosted on at most one network adapter on a single machine at any given time. This restriction prevents configuring a single IP address on multiple network interfaces in order to span arbitrary network configurations.
Hence, what is needed is a method and an apparatus that provides a fault-tolerant communication mechanism for nodes within a clustered computing system that supports arbitrary fault-tolerant interconnection topologies and allows each node in the cluster to be accessible through its own address.
One embodiment of the present invention provides a system that facilitates communications between a cluster of nodes within a clustered computing system in a manner that tolerates failures of communication pathways between the nodes. The system operates by configuring a distinct logical pathway between each possible source node and each possible destination node in the cluster, so that each distinct logical pathway is routed across one of at least two disjoint physical pathways between each possible source node and each possible destination node. In doing so, the system configures a first logical pathway between a first node and a second node across a first physical pathway of at least two disjoint physical pathways between the first node and the second node. Upon detecting a failure of the first physical pathway, the system reroutes the first logical pathway across a second physical pathway from the at least two disjoint physical pathways between the first node and the second node.
In one embodiment of the present invention, the system associates a distinct per-node logical address with each node in the cluster. For each source node, the system associates the per-node logical address of each possible destination node with a corresponding logical pathway to the destination node. In this way, a communication from a given source node to a per-node logical address of a given destination node is directed across the corresponding logical pathway to the given destination node. In a variation on this embodiment, the distinct pernode logical address includes a distinct Internet Protocol (IP) address. In a variation on this embodiment, associating the distinct per-node logical address with each node in the cluster involves hosting the distinct per-node logical address on a loop-back interface within each node in the cluster.
In one embodiment of the present invention, the system associates two distinct Internet Protocol (IP) addresses with each distinct logical pathway, one for each of two nodes located on opposite ends of the pathway.
In one embodiment of the present invention, the system associates a distinct Internet Protocol (IP) network with each distinct logical pathway, wherein the two distinct IP addresses associated with each distinct logical pathway are located on the associated distinct IP network.
In one embodiment of the present invention, detecting the failure of the first physical pathway involves using a monitor to periodically test physical pathways in the cluster.
In one embodiment of the present invention, rerouting the first logical pathway involves: bringing down the first logical pathway over the first physical pathway without bringing down connections through the first logical pathway; and bringing up the first logical pathway over the second physical pathway.