1. Technical Field
This invention generally relates to clustering computers, and more specifically relates to communications infrastructures for use on computer system clusters.
2. Background Art
Society depends upon computer systems for many types of information in this electronic age. Based upon various combinations of hardware (e.g., semiconductors, circuit boards, etc.) and software (e.g., computer programs), computer systems vary widely in design. Many computer systems today are designed to "network" with other computer systems. Through networking, a single computer system can access information stored on and processed by other computer systems. Thus, networking results in greater numbers of computer systems having access to greater numbers of electronic resources.
Networking is made possible by physical "routes" between computer systems, and the use of agreed upon communications "protocols." What protocol is chosen depends upon factors including the number of networked computer systems, the distances separating the computer systems, and the purposes of information exchange between the computer systems. Communications protocols can be very simplistic if only a few computer systems are networked together at close proximity. However, communications protocols become more sophisticated as greater numbers of computer systems are added, and as computer systems are separated by greater distances.
The sophistication of communications protocols also varies with the type of information exchange. For instance, some protocols emphasize accuracy in sending large amounts of information, while others emphasize the speed of information transfer. The communications requirements of the applications running on a computer system network determine what type of protocol is chosen. An example of a computer application requiring real-time, reliable information transfer is a "cluster" management application.
Clustering is the networking of computer systems for the purpose of providing continuous resource availability and for sharing workload. A cluster of computer systems appears as one computer system from a computer system user's perspective, but actually is a network of computer systems backing each other up. In the event of an overload or failure on one computer system in a cluster, cluster management applications automatically reassign processing responsibilities for the failing computer system to another computer system in the cluster. Thus, from a user's perspective there is no interruption in the availability of resources.
Clustering is made possible through cluster management application programs running on each computer system in a cluster. These applications relay cluster messages back and forth across the cluster network to control cluster activities. For instance, each computer system in a cluster continuously monitors each of the other computer systems in the same cluster to ensure that each is alive and performing the processing assigned to it. Cluster messaging is also used to distribute updates about which computer systems in the cluster have what primary and back-up responsibilities. Because of the high volume of messages needed to support a cluster, communications protocols employed for cluster messaging must support high speed, real-time, reliable information transfer.
Unfortunately, existing protocols that provide the real-time, reliable information transfer required to support cluster communications have many limitations. For example, they are typically designed for networks located within a single localized area, also called local area networks (LAN's). Clusters of computer systems that use these existing protocols have correspondingly been limited to a network contained within a localized area. Therefore, a key limitation to the clustering of computer systems is that the cluster configuration is limited to one individual LAN. Furthermore, protocols such as TCP/IP that do support wide area networking do not have the required combination of reliability, speed, and efficiency necessary to effectively provide group communications between computer systems in a cluster.
As more resources become accessible across computer system networks, the demand for continuous access to such network resources will grow. The demand for clusters as a means to provide continuous availability to such network resources will grow correspondingly. Without a reliable, fast and efficient way to communicate between complex clusters of computer systems, continuous availability of network resources will not be fully realized.