The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Certain computing systems comprise a distributed plurality of nodes. For example, a distributed system may comprise a distributed plurality of processors, wherein each processor is a node within the distributed system.
The several nodes of a distributed system may execute processes that communicate with each other to achieve some result. For example, a first node may execute a first process that communicates with a second process that is executed by a second node. A given node of a distributed system may execute multiple processes. Communication between processes that execute within a distributed system is typically achieved through messages that are sent between those processes. A message typically indicates an address for which the message is destined.
Within a distributed system, processes that execute on different nodes may have a collective identity. Such processes may share a role in the distributed system collectively. Such processes may form a group, assuming that the distributed system provides for group communication. Multiple groups may exist within a distributed system, and nodes within the distributed system may be associated with multiple groups. Two or more nodes within a distributed system may be associated with the same group. A membership service implemented within a distributed system may maintain information about which nodes are associated with a particular group. A process that executes on a node that is associated with a particular group may be called a “group member process” of the particular group. A process that executes on a node that is not associated with the particular group may be called a “non-group member process” relative to the particular group. Each group has a different group address. Typically, a message that is delivered to a particular group's group address reaches all of the particular group's group member processes and only the particular group's group member processes.
By sending a message to a single group instead of multiple nodes, a message sender does not need to possess information about the location or identity of any of the individual message receivers. In other words, by sending a message to a group, a message sender can be referentially decoupled from all of the individual message receivers. For example, a group member process may send a message to a group by instructing a transport layer, through a primitive provided by an application programming interface (API), to send the message to the group's group address. Under some approaches, a transport layer sends the message by multicasting the message. Thus, according to some approaches, a group address is sometimes called a “multicast address.” Under other approaches, a transport layer emulates multicasting the message but actually sends the message through means other than multicasting the message. In response to such an instruction from a group member process, and in conjunction with a membership layer, the transport layer makes a reasonable best effort attempt to deliver the message to all of the group's group member processes.
Communication between processes that execute in a distributed system is not problem-free. Due to the imperfect nature of computing systems generally, the mere sending of a message does not guarantee that the message actually will be received by any of the intended receivers. For example, a failure of a system component can cause messages to become, or appear to become, lost in transit.
Additionally, in asychrononous distributed systems, even though messages originally may be sent in a certain intended order, the messages ultimately may be received in a different order. For example, if a message sender retransmits a message, then the retransmitted message may be received out of order relative to other messages that were not retransmitted. Sometimes, message order is important. For example, if messages that represent consecutive segments of a video stream are received out of order, then it is unsatisfactory to present the segments to a user in the same order in which the messages were received.
One approach to guarantee that messages communicated between group member processes are communicated reliably and with sufficient information to enable a recipient to organize the messages into an intended order involves associating each group member process with its own “sequence number space.” The concept of a sequence number space is a part of group communication. When a process joins a group, all of the group members agree on the new group member process and the new group member process' sequence number space. Each message that a group member process sends is associated with a different sequence number that uniquely identifies the message and the message's intended order relative to other messages that the group member process sends. When any group member process receives a message from another group member process, the receiving group member process determines whether the message's associated sequence number matches a sequence number that is next expected from the sending group member process. Each group member process maintains a separate next expected sequence number for each other group member process. Thus, each group member process is associated with its own separate and group-recognized sequence number space.
If the message's associated sequence number matches the next expected sequence number, then the receiving group member process increments the next expected sequence number and sends an acknowledgement to the sending group member process. Because the acknowledgement identifies the message's associated sequence number, the sending group member process “knows” upon receiving the acknowledgment that the message does not need to be retransmitted. According to one approach, a transport layer keeps track of how many acknowledgements are expected as a result of sending a message to a group that has a certain number of members.
Alternatively, if the message's associated sequence number does not match the next expected sequence number, then the receiving group member process does not send an acknowledgement to the sending group member process. According to one approach, the sending group member process recognizes, due to the lack of an acknowledgement that corresponds to the message's sequence number, that the message needs to be retransmitted. Under another approach, the receiving group member process requests that the sending group member process retransmit the message that is associated with the next expected sequence number. According to one approach, the transport layer retransmits the message while awaiting the expected number of acknowledgements. The transport layer counts acknowledgements to determine whether a message has been delivered reliably. After a specified period of time has passed, the transport layer may determine that delivery has failed.
The approaches described above guarantee reliable and orderly communication between group member processes, at least in part through the acknowledgement and retransmission of messages. In other words, the approaches described above guarantee reliable group communication. However, it is sometimes beneficial to associate only some, and not all, nodes of a distributed system with a particular group.
Under some circumstances, it is useful for a non-group member process to send a message to a group. For example, a non-group member process may be a client application that requests a service that is collectively offered by group member server processes that execute on nodes that are associated with the group. For another example, a non-group member process may publish data to group member processes that collectively expect the data, wherein all of the group member processes execute on nodes that are associated with the group.
Due to past definitions of group membership, in past approaches, only group member processes have been associated with group-recognized sequence number spaces. As a result, the mechanisms described above that ensure reliable and orderly communication between group member processes have not been available to guarantee reliable and orderly communication between a non-group member process and a plurality of group member processes.
Thus, when a non-group member process sends a message to a group, the intended receivers are unable to acknowledge the message according to a sequence number, because, by definition, the intended receivers do not maintain a sequence number space in connection with a non-group member process. In the absence of acknowledgements, the non-group member process may attempt to increase the probability that the message will be received by all of the group member processes by retransmitting the message to the group.
While this approach may increase the probability that the message will be received by all of the group member processes, it also introduces the possibility that one or more group member processes may receive the same message twice. In the absence of a unique sequence number per message, the group member processes are not able to ascertain if a given message duplicates a message that has already been received. Two messages with the same content may or may not be duplicates. Sometimes, it is important to be able to determine whether a message is a duplicate of a prior message. For example, if a message that represents an unrepeated segment of a video stream is received more than once, then it is unsatisfactory to present that segment to a user more than once.
Existing distributed systems cannot guarantee that every group member process that executes on a node that is associated with a group will receive, in the intended order, all messages that a non-group member process sends to the group, especially when those messages are delivered at high rates. Because existing approaches do not provide a sequence number space for non-group member processes, existing approaches do not impose any order on messages that are sent to a group by a non-group member process.
Based on the foregoing, there is a clear need for a way to communicate messages from a non-group member process to a group in an optimal, reliable, and orderly manner, without causing the non-group member process to receive messages that are sent to the group.