The invention relates generally to computer networks and, more particularly, to networks that support multicast messaging protocols.
Known reliable multicast protocols typically require that each participant, or member, of a group buffer a received multicast message, to allow for retransmission of the message. This ensures delivery of the multicast message to all members, even if the sender of the message ultimately fails. A given member retains a received multicast message until the message is xe2x80x9cstable,xe2x80x9d that is, until it becomes known that the message has been delivered to every member with at least a relatively high probability.
The multicast protocols operate in three phases. First, in an initial multicast phase, a sender multicasts the message over the group and attempts to provide the message to as many members as possible. Next, in a repair phase, the members detect message losses and request retransmission of the message from the sender or other members, as appropriate. Finally, in a garbage collection phase, the members release the buffer space assigned to the message once the message becomes stable. Most multicast protocols perform the repair and garbage collection phases using a combination of positive and/or negative acknowledgement messages.
Known epidemic multicast protocols use gossiping in the repair phase. Each member periodically selects another member at random and sends to that member a gossip message that includes a list of the messages that the gossiping member has retained in its buffer and/or has delivered. The selected member then determines if it has in its buffer any messages that are not on the list, that is, any messages that are lost to the gossiping member. If so, the selected member retransmits the lost messages to the gossiping member. Also, the selected member may request that the gossiping member retransmit any messages from the list that the selected member has not yet delivered, that is, any messages that are lost to the selected member. The members thus perform point-to-point repair.
The garbage collection phase of the epidemic multicast protocol is typically performed by having each member release the buffer space allocated to a given message a predetermined time after the member delivers the message. The predetermined time is based on a prediction of how long it takes to disseminate a lost message to the membership through gossiping. Accordingly, the predetermined time the message must be retained depends largely on the number of members in the group.
As the number of members in the group increases, the time to both accomplish and detect message stability increases. Further, depending on the application, the combined rate of sending may also increase as the size of the membership increases. Accordingly, each member must buffer more and more messages at any given time, which means that each member must maintain and operate larger and larger buffers. These multicast protocols thus do not scale well.
A scalable multicast protocol includes a mechanism that allows a given multicast message to be stored by a subset of the entire group membership. The buffered messages are spread over the membership, so that any given member buffers only a portion of the messages.
More specifically, a subset of xe2x80x9cCxe2x80x9d members buffers a multicast message, where C is selected to reduce to an acceptable level the probability that a given message will be lost before it reaches at least one of the C members. When a member receives a multicast message, the member determines whether or not it should buffer the message by manipulating a string of bytes that is unique to both the message and the member. In the exemplary system, the byte string, which consists of a message identifier that is included in the message and the member""s address, is manipulated in accordance with a hash function. As discussed in more detail below, the member buffers the message if the result is less than C/n, where n is the number of known members. A member that buffers a message is hereinafter referred to as a xe2x80x9cbuffererxe2x80x9d of the message.
When one of the C bufferers receives a gossip message that indicates that the multicast message has been lost to the gossiping member, the bufferer retransmits the message to the gossiping member. When, however, a member that is not one of the C bufferers receives such a gossip message, the member determines which members are bufferers of the lost message and requests that one of the bufferers retransmit the message to the gossiping member.
The selected member identifies the bufferers by manipulating the message identifier associated with the lost message and the respective addresses of the members known to the selected member. The selected member then picks one of the identified bufferers at random, and sends to it a request for retransmission that identifies the message and specifies the gossiping member as the destination address. The bufferer that receives the request then retransmits the lost message to the gossiping member, assuming that the selected bufferer has not already released the buffer space allocated to the message. The members then continue to gossip and the lost message should, with very high probability, be supplied to all of the members by the bufferers through this process.
The system trades off reduced buffer space at each member against an increase in traffic due to the sending of requests for retransmission. In a system with a low probability of lost messages, there is a relatively small increase in the associated traffic, and a rather significant reduction in required buffer space at each member. Further, as the size of the membership increases, the size of the buffer decreases since the number of members that buffers a given message, C, is essentially fixed.
The multicast protocol may further include a mechanism to detect, in the initial phase of the multicast operation, xe2x80x9ccatastrophic failuresxe2x80x9d in which none or few of the members receive a multicast message. Each member includes the buffer discussed above and a relatively small, fixed-size xe2x80x9cshort-termxe2x80x9d buffer that holds a limited number of the received messages in the order in which the messages are received. A member monitors any holes or gaps in the sequences of incoming messages, and detects a catastrophic failure when a received gossip message identifies one of the same holes or gaps, as discussed in more detail below.
When such a failure is detected, the member sends a request for multicast retransmission of the associated missing message to the sender, and the sender again multicasts the message to the group. If a catastrophic failure has occurred, the multicast retransmission request is sent relatively soon after the initial transmission, and thus, the request for multicast retransmission can be handled from the sender""s short term buffer. If the initial multicast transmission did not involve a catastrophic failure, the point-to-point repair is handled in the manner discussed above.