“Clustering” generally refers to a computer system organization where multiple computers, or nodes, are networked together to cooperatively perform computer tasks. An important aspect of a computer cluster is that all of the nodes in the cluster present a single system image—that is, from the perspective of a user, the nodes in a cluster appear collectively as a single computer, or entity.
Clustering is often used in relatively large multi-user computer systems where high performance and reliability are of concern. For example, clustering may be used to provide redundancy, or fault tolerance, so that, should any node in a cluster fail, the operations previously performed by that node will be handled by other nodes in the cluster. Clustering is also used to increase overall performance, since multiple nodes can often handle a larger number of tasks in parallel than a single computer otherwise could. Often, load balancing can also be used to ensure that tasks are distributed fairly among nodes to prevent individual nodes from becoming overloaded and therefore maximize overall system performance. One specific application of clustering, for example, is in providing multi-user access to a shared resource such as a database or a storage device, since multiple nodes can handle a comparatively large number of user access requests, and since the shared resource is typically still available to users even upon the failure of any given node in the cluster.
The nodes within a clustered computer system are typically coupled to one another via some form of communication network. One type of network used to interconnect nodes, for example, is a broadcast-type network such as an ethernet network, where nodes coupled to the network have unique addresses, and where information is transmitted in the form of packets that are addressed to the particular node or nodes that are the intended recipients of the information. With a broadcast-type network, the packets are received by all nodes, and only those nodes having appropriate addresses will process the packet. Often, broadcast-type networks rely on central hubs or switches that receive packets from sending nodes and broadcast the packets to all other nodes on the network.
Another type of network used to interconnect nodes in a clustered computer system is a point-to-point network, which includes a number of point-to-point interconnections between nodes, and where the nodes themselves assist in routing packets to appropriate nodes on the network. As with a broadcast-type network, each node is typically assigned a unique address. In contrast with a broadcast-type network, however, each interconnection effectively has a single node at each end, so that a packet that needs to be sent to a node that is several interconnects away from a sending node will be relayed by all of the intermediate nodes in the path. Typically, nodes will have multiple network ports that are directly linked with individual network ports on other nodes. While point-to-point networks are typically more complex than broadcast-type networks, such networks tend to offer comparatively greater bandwidth, since individual packets are typically only routed over a subset of the networked nodes, and as a result, multiple packets may often be communicated at the same time over different paths in the network.
Both of the aforementioned types of networks typically require that each node be aware of the network address of every other node with which that node wishes to communicate. Such a requirement, however, presents a problem if a network is initialized to a state where none of the nodes has a network address assigned a priori or through an external agent, as nodes are initially unable to communicate with one another due to an inability to determine the addresses of the other nodes on the network.
Various distributed consensus algorithms are known to initialize a network and assign appropriate network addresses to various nodes that are present on a network. However, in many instances, these distributed consensus algorithms still require that each node already have a unique network address that is known to all other nodes on the network. Thus, in cases where unique network addresses have not yet been established, conventional distributed consensus algorithms cannot properly initialize a network.
The inability to properly address network communications when unique addresses have not been established for all existing nodes is particularly problematic in clustering environments, and in particular clustering environments that rely on point-to-point networks, as the high availability requirements of such systems often dictate that nodes be added to and removed from the system with minimal interruption of service, and often without interrupting communications between unaffected nodes. Moreover, even where a network is first being initialized, it is highly desirable for network addresses to be assigned with minimal, if any, administrator interaction. However, distributing network addresses among nodes in an automated manner, when the nodes are not yet aware of the network addresses of other nodes, is often not practicable in conventional clustering environments.
It has been found, therefore, that distributing network addresses throughout a cluster's network prior to assignment of unique addresses requires some form of communication mechanism that allows at least rudimentary information exchange between nodes without requiring the use of network addresses. A significant need has therefore arisen in the art for a manner of permitting at least limited communications between nodes without the requirement for unique network-wide addresses being known to all nodes.