Fault tolerant computing and communications systems having redundant or spare components are known. One or more active primary data processing components are shadowed by one or more spare components, ready to take the place of the primary components in the event of failure.
Typically, the systems are adapted to effect a switch-over from a failed active component to a spare component in real time, and as quickly as possible to avoid possible data losses at the failed component.
Such fault tolerant systems, however, are premised largely on the assumption that failures of components are caused by hardware failures which are typically permanent. In many systems, computing resources are distributed among modules, with each module having its own processor under software control. Such systems are prone to software faults within the modules, as well as traditional hardware faults. Software faults, unlike hardware faults, are often aberrant, occurring rarely under special circumstances. Moreover, software faults are typically not remedied by replacing one active module with an identical spare module having the same software deficiency.
Accordingly, a fault tolerant system that more effectively recognizes and handles recoverable faults is desirable.