A number of systems have been developed for providing network communications among groups of users. These systems may employ a network configuration, such as a Totem ring network configuration, in order to provide a fault tolerant structure for the network communications. Totem ring networks are relatively well known and provide for multicast delivery of messages, where messages may be transmitted and delivered to multiple locations, and ensure that the sequence in which messages are received is maintained throughout the system.
The Totem ring protocol operates by organizing the nodes of a system into one or more virtual rings of processors around which a token rotates. When a token is received by a processor, that processor may multicast any messages in its pending transmit queue. The token contains a sequence number and retransmit list.
A sequence number is employed to provide a total order of messages such that each node can order messages in the same order. On each multicast, the token's sequence number is increased. When the token is forwarded to the next node in the ring, the token sequence number is transmitted based upon what was received plus the number of multi-casted messages.
The retransmit list is used to request retransmission of missing messages. On receipt of a token, a processor of a node compares the token's sequence number with its currently received messages. If any message is missing from its list, it augments the retransmit list with the missing messages. Also, upon receipt of the token, any messages in the retransmit list for which the processor has a copy are multicast by that processor.
In order to support multiple rings, the totem protocol creates gateways between the totem single rings. In general, each of the rings operates on separate multicast target addresses. Each gateway only forwards messages which are required by the other ring. In this way, the totem multi-ring protocol is more scalable. For example, it is possible for each sub-ring among a plurality of sub-rings to obtain the maximum throughput available and reduce latency by ½ the token rotation time in a single ring structure.
In order to determine whether to forward a particular message, the gateway nodes check a character string in each message called a “group” that identifies which messages should be sent to which nodes. Each gateway maintains a list of groups that are relevant to the rings it interfaces.
One type of Totem ring architecture that is particularly fault tolerant is a redundant Totem ring. The nodes of a first ring are replicated in a second ring. Messages and actions are replicated across both rings using multicasting of messages such that, in the case of a failure in the first ring, a protection mechanism permits switching over to the second ring without loss of data.
One problem with existing redundant Totem rings is that a failure of a single node in one ring, say, in a network interface card (NIC), causes the whole ring to fail. In order for that ring to be “healthy,” all the NICs on that ring have to be working and active. An administrator may type in a command to reset the rings and attempt to re-enable the failed ring based on a timer. Because the Totem ring employs the multicast protocol, one or more messages sent to nodes on the ring to be re-activated while the timer is active may be blocked, and the timer expires. Attempts have been made to overcome this problem by resetting the timer, for example, every ten seconds, and then re-enabling the ring. Unfortunately, the multicast protocol would still block the transmission of messages for five to ten seconds before the failed ring is re-enabled.