The present invention relates to an electronic clock and, in particular, to a highly reliable, synchronized fault tolerant clock which employs a hot spare clock module.
Numerous applications exist with requirements for extremely high reliability computing systems. One such application is in airborne electronic (avionic) systems wherein fault tolerance is achieved by the use of redundant systems, especially in flight control applications.
An essential element of a computing system is the clock. The clock provides the timing signals which control computer operation. Data distribution and various other functions are typically based on synchronous clock edges. Thus, to achieve high reliability, redundant clocks are often employed. However, to assure stable computer operation, it is important that the provided clock signal be independent of switching transients or asynchronous operation which might result from the switching in or out of the redundant clock signals.
Existing fault tolerant clock designs can be broadly classified as either software or hardware based. Software assisted architecture uses large time frame synchronization in the microsecond or millisecond range. It further requires the individual clock modules exchange their signals periodically for re-synchronization of the independent clocks. Software assisted systems take time to read and average the skews among the clock channels before bringing the clocks to synchronous agreement. Data is ignored when the clock skew grows beyond a predetermined threshold. The software algorithm asserts a large overhead on the system throughput. Software algorithms are not bit synchronous and are not suited for high performance or time critical applications.
Conventional hardware architectures using phase-locked loop techniques are complex and slow. Most of the existing fault tolerant clock designs are of this type. Each clock channel receives clock signals from the other channels to create a reference signal for its own phase-locked loop. The reference signal is fed to the phase detector where it is compared to the local clock signal. The phase difference is converted into a voltage level that is used to adjust the local oscillator. Since each channel forms its own reference and local signals, the clocks suffer from phase jitters. The phase-locked loop can only track slowly varying signals and fails in the presence of abrupt changes or signals that exceed its lock-in range.
Both the software and the phase-lock loop architectures require 3m+1 channels to tolerate m faults.
Another class of hardware architecture that can tolerate m faults using only m plus one channels is the standby sparing architecture. Here, the master clock is switched out and a spare (1 or more clock signals) is switched in. Switching is controlled by independent monitoring for a missing clock pulse. This does not provide 100% fault detection since it cannot detect phase jitters and phase drift and small changes in duty cycle. This architecture is also limited in that the receiving computer has to deal with clock switch-over transients.
Techniques are known in the art for combining N-modular redundancy with standby sparing, but such techniques can only be used for data and not the clock. Such designs use a centralized switching network known as an integrative cell array switch. The cell array is complex, exhibits a long propagation delay through many levels of gates and requires an external clock for synchronous switching of the modules.
To tolerate m faults, the teaching in the prior art requires the use of 3m+1 clock channels, 2m+2 clock channels or 2m+1 clock channels.
Thus, in accordance with the prior art teachings, in order to tolerate a single fault, a minimum of three modules (triple modular redundancy) is required. In order to tolerate two random faults (fail-operational/fail-operational), previous fault tolerant clock designs utilized five or more modules employing three-out-of-five voting systems. These architectures require excessive hardware.
Designs which use four modules with three-out-of-four voting schemes can only tolerate one random fault and limited compensating of benign faults. Three-out-of-four voting schemes also suffer from a two-out-of-two split situations. For example, there is no majority to vote on when two channels are out of phase with the other two. Re-configurable voters using four channels to tolerate two faults also suffer from this drawback.