This invention is directed to applications and/or systems that require group communication among a number of distributed computers. More specifically, this invention relates to providing fast and reliable propagation of multicast messages from multiple sources to multiple destinations in a system that may experience frequent machine and/or network failures. With the rapid growth of the Internet, more and more applications are being developed for, or ported to, wide-area networks in order to take advantage of resources available at geographically disparate locations, examples being grids, peer-to-peer data sharing, and computer-supported collaborative work. Group communication efficiently delivers messages to a large number of receivers in a distributed system. Group communication is a mechanism through which a single node can send a message to one or more receivers. In this invention, group communication is also referred to as multicast. Group communication is, therefore, a basic utility for writing distributed applications and can be used for various purposes, including the dissemination of system monitoring events to facilitate the management of distributed systems, and the propagation of updates of shared state to maintain cache consistency.
A dependable group communication protocol for large-scale and delay-sensitive mission critical applications should meet at least the following four basic requirements: (1) reliable message delivery, (2) fast message delivery, (3) scalable performance, and (4) efficient network resource consumption. For reliable message deliver, the system should sustain stable throughput even in the face of frequent packet losses and node failures. Systems that solely optimize for friendly environments are unacceptable. With regard to fast message delivery, messages should be delivered via an efficient path, without undue delay. Many mission critical applications have real-time constraints, e.g., airline control and system monitoring. When a deadline is missed, the message becomes useless. Even within the deadline, the value of the message depreciates over time. As to scalable performance, the system should be self-adaptive to handle dynamic node joins and leaves and, as the system grows, any degradation in efficiency, reliability, and message delay should be graceful. Efficient network resource consumption is desirable so that, when multicasting a message to a large number of receivers at the application level, the load should be balanced across the available links.
Two categories of existing protocols, namely reliable multicast and gossip multicast protocols, have the potential to meet some, but not all of, the requirements above. The “reliable” multicast protocol sends messages through a multicast tree that spans over all receivers and relies on retransmissions of lost messages to handle failures. In a friendly environment, it propagates messages rapidly. Previous study, however, has shown that a small number of disturbed slow nodes can lead to dramatically reduced throughput for the entire system. Reliable multicast, therefore, is not a scalable solution for dependable group communication. Using “gossip” multicast protocol, nodes periodically choose some random nodes to propagate summaries of message IDs (so-called “gossips”) and to pick up missing messages heard from other gossips. The redundancy in gossip paths addresses both node and link failures. Gossip multicasting delivers stable throughput even in an adverse environment; however, the propagation of multicast messages can be slower than that in reliable multicasting, since the delay is proportional to the gossip period and exchanging gossips ahead of actual messages incurs extra delay. Moreover, because of their obliviousness to network topology, random gossips in a large system can impose extremely high loads on some underlying network links.
What is needed, and is an object of the present invention, is a group communication mechanism that combines the benefits of reliable multicasting, including topology awareness and fast message propagation, with the benefits of gossip multicasting, namely stable throughput and scalability, while avoiding their limitations. Such a combination would provide reliable, fast, and scalable multicast message delivery in distributed systems even in the face of frequent machine and/or network failures. Another object of the invention is to provide dependable group communication for large-scale mission critical applications that are delay sensitive.