1. Field of the Invention
The present invention relates to computer clusters. More specifically, the present invention relates to a method and apparatus for automatic configuration of networking for computer clusters.
2. Related Art
Corporate intranets and the Internet are coupling more and more computers together to provide computer users with an ever widening array of tools. Many of these tools follow the client-server model in which a client communicates with a server to have the server perform an action for the client or provide data to the client. A server may have to provide these services to many clients simultaneously and, therefore, must be fast and reliable.
In an effort to provide speed and reliability within servers, designers have developed clustering systems for the servers. Clustering systems couple multiple computers together to function as a single unit. Multiple networks can be used to couple together the individual computers—also called nodes—within a cluster. The networks that are used to couple the individual computers within a cluster are referred to as “private interconnects.” Additionally, multiple networks may be used to couple the nodes to the outside world, either by coupling the nodes to a corporate intranet or the Internet, or by coupling the nodes to computers that are not part of the cluster. These external networks are referred to as “public interconnects.”
Thus, each node in a cluster may have multiple network interfaces, which may be coupled together in complex topologies. For instance, in a two-node cluster, a network interface in the first node may be directly coupled to a network interface in the second node. In a larger cluster, however, a network interface on each node may be coupled to a hub or switch. In many cases, the nodes will be coupled using multiple hubs or switches to ensure that the failure of a single hub or switch does not cause failure of the entire cluster.
Software that controls a clustering system requires knowledge of how the network interfaces are coupled within the cluster, so that the software can direct network traffic appropriately. The clustering software also requires this knowledge in order to detect failures within the cluster. Typically, failures are detected by using a heartbeat mechanism, which periodically sends messages between nodes. Failure of these heartbeat messages to get through the network for an extended period of time can indicate a failure within the cluster. One typical requirement is that each node be connected to each other node through two private networks, so the cluster can keep operating in the event of a network failure. Other configurations are possible, such as requiring only one, or more than two private networks.
The process of cluster initialization can present a number of challenges. In a typical implementation, when a user first installs the clustering software on a cluster of computers, the user must first specify the names of the nodes making up the cluster. Next, the user must manually specify the couplings among the various nodes within the cluster and the couplings to the external networks and devices. For example, the user would specify that the first private coupling includes the network interface designated hme0 on node 1, the network interface designated hme0 on node 2, and the network interface designated hme0 on node 3, all of which are coupled to a switch designated switch 1. This process must be repeated for each private interconnect in the cluster.
This process of specifying the interconnects among the nodes is time-consuming and error-prone. As the size of a cluster grows from two nodes or four nodes to thirty-two nodes, or more, the number of interconnects increases rapidly, thereby requiring considerable effort to properly configure the cluster. In addition, it is easy for a technician to incorrectly specify the interconnects within a cluster or to incorrectly connect the physical cables, which can cause the cluster to fail or to operate at reduced capacity or reliability.
What is needed is a method and apparatus that eliminates this error-prone and tedious manual specification of network interconnects within a cluster of computers.