Multiple redundant critical systems are often used in aircraft and aerospace applications where there is a need for safety, low maintenance, and reliability. A single backup system is generally not sufficient where disagreements may exist between two nominally functional systems, since the failed system may not be easily identified. For this reason any critical system, such as the avionics instrumentation package on an aircraft, typically includes three or more redundant microprocessors running in parallel. The failure of one of the microprocessors can then be detected by comparison of its output to that of the other microprocessors.
Each microprocessor in a redundant system requires an accurate time base reference, and separate time base clock channels are normally included for each one. Because the microprocessors operate in parallel and their outputs are synchronously compared in real time, it is important that the time bases for the microprocessor also be synchronized. A comparison of the outputs from multiple processors will indicate an error if one of the microprocessors is fetching an instruction to execute, which the other microprocessor has already executed. A fault in any of the clock channels may seriously impact the synchronization of the other clock channels, and thus undermine the proper operation of the entire redundant microprocessor system.
A clock channel fault may comprise an intermittent connection, a shift in the frequency of one of the clock channels due to environmental effects, or a component failure in the circuitry of one of the clock channels. Such faults can also be caused by intermittent problems, e.g., a cold solder joint, or by changes in an electrical parameter of one of the components in a clock channel over time. In the worst case, one of the clock channels may fail completely, effectively terminating operation of the microprocessor to which it is connected as a time base. Clearly, it is desirable that the redundant clock system be able to tolerate a limited number of faults without loss of synchronization of the clock channels that continue to operate properly. Ideally, all channels should continue to produce a synchronized time base output signal even if one or more components have failed.
Initially, it might seem a simple matter to accommodate one or more faults in a redundant clock system, since the clock channels that are operating without faults can be used to synchronize the time base signals for all of the microprocessors. In fact, the problem and its solution is not trivial, particularly where the fault does not represent a catastrophic failure of one clock channel. If the fault in one clock channel is not easily detectable, it may cause different erroneous signals to be provided to the other clock channels, making synchronization virtually impossible.
The task of synchronizing clock channels is analogous to a classic exercise in logic known as the Byzantine Generals' Problem. In the Byzantine Generals' Problem, the Byzantine Army, separate divisions of which are controlled by one of several different generals, surrounds an enemy city. Communication between the generals is limited to oral messages carried by runners. One or more of the generals may be a traitor who will attempt to confuse the other generals by sending false messages. In the simple case where there are only three generals, its has been shown that a single traitor can confuse two loyal generals, leading to the theorem that more than two thirds of the generals must be loyal to guarantee that the loyal generals can properly reach agreement on a plan of battle.
By analogy to this classic problem, a single clock channel in which a fault appears can prevent two other clock channels from being correctly synchronized, if the fault causes a different time base signal to be conveyed to each of the properly operating clock channels during the attempted synchronization. Based on this theorem, at least four redundant clock channels are required in a clock system in order to tolerate a single fault. A more elegant solution, which apparently contradicts the theorem, permits four redundant clock channels to tolerate more than one fault. U.S. Pat. No. 4,979,191, Bond et al., Dec. 1990 (assigned to the same assignee as the present invention) discloses such a solution.
In this patent, four redundant clock channels are periodically synchronized after a counter in each of the channels has accumulated a predetermined number of clock cycles. Each clock channel includes a clock unit and an isolation port. The counter, which is in the clock unit, accumulates the predetermined number of clock cycles, disables the clock channel output signal, and produces a sync pulse that is input to a voter block, connected to receive the sync pulse from all of the clock channels. In response to a second sync pulse received from one of the other clock channels, the voter block produces a load pulse signal that is input to the isolation port of that clock channel. Corresponding isolated load signals are produced by the isolation ports for each clock channel and provided to another voter block in each clock unit. When the second isolated load signal is received from the other clock channels, the other voter block produces a load enable signal that is input to the counter, causing it to reset and begin counting again, and enabling the clock channel time base output signal, in synchronization with the other clock channels. Up to N simultaneous faults may be sustained in this clock system, without loss of synchronization in the clock channels that continue to operate properly, so long as 2N+1 clock channels are provided. The only significant drawback to this technique is a limitation in speed, such that it is used only with relatively slow speed applications, such as input/output frame synchronization, and is not intended for use in higher frequency time base applications. It is also more complex than is desired for many applications and only periodically synchronizes the redundant clock channels.
Another commonly assigned patent, U.S. Pat. No. 4,984,241, through Jan. 1991, discloses a triple modular redundancy clock system. In this disclosure, three clocks can be synchronized within several nanoseconds of each other if the circuit components, e.g., oscillator trim capacitors, in each channel are carefully tuned. Since trimming one capacitor affects all three channels, this procedure must be repeated many times, until all channels are properly tuned, and is therefore extremely time consuming. Moreover, if environmental effects such as temperature cause a shift in the trimmed values of these components, oscillation synchronization of the clocks can no longer be maintained. Insufficient phase range in the feedback loop of the crystal oscillators employed in each channel limits the frequency range over which the oscillators can be pulled into synchronization. Furthermore, power-on automatic reset and automatic warm reset of the clock circuit were not implemented in this invention. It was also noted that the clocks in each channel could appear to be synchronized when in fact, they were an integer number of cycles out of synchronization, since each crystal has its own stabilization time period during power-up.
Accordingly, a simple multiple redundant fault tolerant clock system is required that can operate at relatively high clock frequencies, be Byzantine fault tolerant, and automatically synchronize at power-on or warm reset of the system. The system should be highly integrated and be capable of continuously maintaining each of the clock channel time base outputs in synchronization without the need for careful trimming of components, and without concern for maintaining such synchronization during operation within standard operating environmental ranges.