The present invention relates in general to communication systems and, more particularly, to the use of redundant communication fabrics to enhance fault tolerance in Totem communication networks.
A number of systems have been developed for providing network communications among groups of users. One such system comprises a Totem ring network in which a plurality of devices is connected to a bus network. Each communication device includes circuitry for interfacing with the Totem ring network (e.g., transmitting and receiving messages on the Totem ring network), and a Central Processing Unit (CPU) adapted for executing processes comprising application programs effective for managing call processing, database operations, industrial control, and the like.
A Totem network provides for multicast delivery of messages, wherein messages can be transmitted and delivered to multiple locations, with assurance that the sequence in which messages are generated is maintained as the messages are transmitted and delivered throughout the system. Totem networks are well known to those skilled in the art and are described in greater detail in various technical papers and articles, such as an article entitled xe2x80x9cTotem: A Fault Tolerant Multicast Group Communication Systemxe2x80x9d by L. E. Moser et al., published in the April 1996, Vol. 39, No. 4 Edition of Communications of the Association for Computing Machinery (ACM).
In Totem networks, message delivery is controlled using a token similar to that used in a token ring system to identify which device can transmit onto the network. Periodically, such as every few milliseconds, the token is sent around the network to each device in sequence. As the token is received by each device, the device determines whether it has a message or data to transmit over the network. If a device does have a message or data to transmit over the network, it will send that data first before forwarding the token. If a device does not have a message or data to transmit over the network, then it forwards the token and sends it to the next device.
Conventionally, messages on a Totem network are transmitted and delivered over a physical medium comprising a single fabric of wires or fiber optic cable. As a consequence, while Totem networks assure that messages are transmitted and delivered in the same sequence in which they are generated, there is no assurance that the messages will be delivered at all if a fabric fails. The physical medium of a Totem network thus has no fault tolerance designed into it.
Accordingly, there is a need for a system and a method that will provide Totem networks with fault tolerance to enhance the probability that sequentially transmitted messages will be delivered across the Totem network.
The present invention accordingly provides a Totem network with multiple redundant fabrics through which messages can be transmitted and delivered. The Totem network is configured so that, if one fabric fails, another fabric can be used, thereby providing a Totem system with fault tolerance. The Totem network is also configured so that if a failed fabric has been repaired and thus becomes operational, the fabric repair can be detected and the repaired fabric declared operational so that devices on the network can use it. The Totem network is also configured so that a failure of a device on the network can be detected.
The present invention further comprises a method embodied in computer software residing on the network for controlling the use of the redundant fabrics. The computer software can be configured to detect when a fabric failure has occurred, and, after a failure has been detected, to declare the fabric to have failed so that devices on the network will use only fabrics that are operational. In the event a failed fabric has been repaired, the computer software can detect the repair and declare the formerly-failed fabric operational so that devices on the network can use it. The computer software can also be configured to detect when a device on the network has failed.