1. Field of the Invention
This invention relates to maintenance and fault tolerance of large electronic systems, such as large telecommunication switching systems.
2. Discussion of Related Art
Modern computer and communication systems typically use a multitude of cards interconnected, for example, through a backplane. Preferably the system is architected so that it is scalable, allowing other cards to be added to the system.
Changes in operating condition of these cards need to be communicated to equipment operators so that they know whether the equipment is faulty or working. Consequently, there is a need for a mechanism to communicate information from the cards to a central point to indicate the existence of faults in a timely manner. In this fashion, corrective action can be taken to eliminate or minimize any service disruption. This mechanism must be capable of handling the changing configuration of the system, for example, if cards are added or subtracted.
In addition to reporting changes in operating condition, the central point must be able to control devices on the cards for maintenance and configuration types of operations, for example, by accessing and setting state on the various cards. Moreover, these activities typically cannot rely on much of the functionality of the cards being operational.
To achieve high-availability, card redundancy is typically employed so that one card may provide a service while another card is being maintained or repaired. The maintenance or repair operation may be to correct a fault or to upgrade the capability of the card. Card redundancy can be employed at the central point (or control card), the cards in communication with the central point, or both.
One feature and advantage of the invention is that it provides a scalable maintenance link system in which the cards may communicate faults, status or interrupts to the central point with deterministic latency regardless of the number of such cards.
Another feature and advantage of the invention is that the maintenance link system is highly reliable without requiring significant and sometimes scarce resources such as a significant number of back- or mid-plane traces.
In accordance with the invention, a system and method for monitoring and maintaining a plurality of modules is provided. Each module of the plurality of modules includes at least one link slave device, and a link controller is connected to link slave devices via a plurality of individual serial, bidirectional connections. The link slave devices and the link controller include protocol logic for communicating according to a bidirectional protocol. At predefined time segments and predefined periodicity, the link slave devices drive the corresponding bidirectional link to the controller to provide maintenance information to the controller. At other predefined time segments, the controller drives the bidirectional link to issue commands to the link slave devices.
Under another aspect of the invention, the link slave devices and the link controller include protocol logic for communicating according to a protocol in which all slave devices communicate to the link controller at substantially the same period of time so that the latency for collecting information from the slave devices is independent of the number of such devices.
Other features and advantages will be apparent from the following disclosure, drawings, and claims.