The need for safety, low maintenance, and reliability has resulted in the development and use of multiple redundant critical systems in aircraft and aerospace applications. A single backup system is frequently not sufficient where disagreements may exist between two nominally functional systems, since the failed system may not be easily identified. For this reason, a critical system, such as the avionics instrumentation package on an aircraft, may include three or more redundant microprocessors running in parallel. Failure of one microprocessor may be detected by comparison of its output to that of the other microprocessors.
Since each microprocessor in a redundant system requires an accurate time base reference, separate clock channels are normally included for each one. Because the microprocessors operate in parallel and their outputs are synchronously compared in time, it is important that the time bases for the microprocessors be at least periodically synchronized. Faults in any of the clock channels may seriously impact the synchronization of the other redundant clock channels, and thus undermine the operation of the entire redundant microprocessor system.
A clock channel fault may comprise an intermittent connection, a shift in the frequency of one of the clock channels due to environmental effects, or a component failure in the circuitry of one of the clock channels. In the worst case, one of the clock channels may fail completely, effectively terminating operation of the microprocessor to which it is connected as a time base. Clearly, it is desirable that the redundant clock system be able to tolerate a limited number of faults without loss of synchronization of the clock channels that continue to operate properly.
Initially, it might seem a simple matter to accommodate one or more faults in a redundant clock system, since the remaining properly operating clock channels could be used to synchronize the time base signals for the other microprocessors. In fact, the problem and its solution is not trivial, particularly if it is not apparent that one of the redundant clock channels has a fault. In the case where the fault does not represent a catastrophic failure of one clock channel, and thus is not easily detectable, the fault may cause different erroneous signals to be provided to the other clock channels, making their synchronization virtually impossible.
The task of periodically synchronizing the clock channels is thus analogous to the classic exercise in logic known as the Byzantine Generals' Problem. In the Byzantine Generals' Problem, a city is surrounded by the Byzantine Army, separate divisions of which are each controlled by one of N different generals. Communication between the generals is limited to oral messages carried by runners. One or more of the N generals may be a traitor who will attempt to confuse the other generals by sending false messages. In the simple case where there are only three generals, it has been shown that a single traitor can confuse two loyal generals, leading to the theorem that more than two thirds of N generals must be loyal to guarantee that the loyal generals can properly reach agreement on a plan of battle.
By analogy to this classic problem, a single clock channel in which a fault appears can prevent two other clock channels from being correctly synchronized if the fault causes a different time base signal to be conveyed to each of the properly operating clock channels during the attempted synchronization process. Based on this theorem, at least four redundant clock channels are required in a clock system in order to tolerate a single fault.
Others have also recognized that providing a fault tolerant redundant clock system is not trivial. For example, in U.S. Pat. No. 4,239,982, a "Byzantine resilient" clock channel is disclosed that includes 2r+2 clock sources. Each clock source generates and distributes to the other clock sources a clock signal that is phase locked to derived system clock signal provided by a clock receiver associated with each clock source. The derived system clock signal for each clock receiver represents the consensus clock signals of the other sources. Any clock receiver responsive to any 2r+1 of the clock sources can derive a correct system clock, even if up to r clock source failures occur. Thus, in this prior art solution to the problem, four clock sources are required to tolerate a single fault in the clock system, consistent with the classical mathematical solution. The present invention achieves Byzantine resilience in a more elegant manner, apparently contradicting the theorem.