1. Field of the Invention
The invention is related to the field of group communication in real-time computing systems.
2. Description of the Prior Art
Components of application systems based on a network of computing nodes (e.g. PCs and workstations) often maintain the client-server relationship among themselves. For the sake of attaining high system reliability and performance, servers are often replicated. These server replicas must then maintain strict consistency among their states. Each message from a client must be received consistently by these server replicas. Also, clients and diverse servers are often tightly coupled in the sense that they interact closely and every party should read in the same order the messages from multiple sources even if not every party reads the identical set of messages. This consistent multicast communication becomes a complicated problem when the possibilities of failures of computer and communication components are not negligible.
Existing solutions are not sufficient because they guarantee neither timely reception and consistent processing of the multicast messages by the receivers nor consistent understanding of the success or failure of a multicast among all nodes involved, i.e., the sender and the receivers under some plausible and non-negligible occurrences of component failures.
Group communication in real-time computing systems has been a subject of research for almost two decades but it is not yet a mature technological field. The main challenge in establishing group communication protocols is to deal with possible fault occurrences. There have been some proposals for using group communication protocols, in particular, reliable multicast mechanisms, as basic building-blocks for fault-tolerant distributed systems. The validity of this thesis cannot be established until practical reliable multicast mechanisms are established.
What is needed is some type of method and system to ensure that the sender and all receivers reach without excessive delay the same correct conclusion that all the receivers correctly received the message.
Further what is needed is some type of method and system to ensure that the sender and all healthy receivers reach without excessive delay the same correct conclusion that at least one receiver failed to receive the message and thus that the multicast was cancelled.
Still further what is needed is some type of method and system to ensure that all healthy receivers reach without excessive delay the same correct conclusion that the sender became permanently disabled before confirming the successful receiving of the message by every receiver and thus that the multicast was cancelled.