Computers have been used in digital control systems in a variety of applications, such as in industrial, aerospace, medical, scientific research, and other fields. In such control systems, it is important to maintain the integrity of the data produced by a computer. In conventional control systems, a computing unit for a plant is typically designed such that the resulting closed loop system exhibits stability, low-frequency command tracking, low-frequency disturbance rejection, and high-frequency noise attenuation. The “plant” can be any object, process, or other parameter capable of being controlled, such as aircraft, spacecraft, medical equipment, electrical power generation, industrial automation, a valve, a boiler, an actuator, or other controllable device.
It is well recognized that computing system components may fail during the course of operation from various types of failures or faults encountered during use of a control system. For example, a “hard fault” is a fault condition typically caused by a permanent failure of the analog or digital circuitry. For digital circuitry, a “soft fault” is typically caused by transient phenomena that may affect some digital circuit computing elements resulting in computation disruption, but does not permanently damage or alter the subsequent operation of the circuitry. For example, soft faults may be caused by electromagnetic fields created by high-frequency signals propagating through the computing system. Soft faults may also result from spurious intense electromagnetic signals, such as those caused by lightning that induce electrical transients on system lines and data buses which propagate to internal digital circuitry setting latches into erroneous states.
Unless the computing system is equipped with redundant components, one component failure normally means that the system will malfunction or cease all operation. A malfunction may cause an error in the system output. Fault tolerant computing systems are designed to incorporate redundant components such that a failure of one component does not affect the system output. This is sometimes called “masking.”
In conventional control systems, various forms of redundancy have been used in an attempt to reduce the effects of faults in critical systems. Multiple processing units, for example, may be used within a computing system. In a system with three processing units, for example, if one processor is determined to be experiencing a fault, that processor may be isolated and/or shut down. The fault may be corrected by correct data, such as the current values of various control state variables, being transmitted (or “transfused”) from the remaining processors to the isolated unit. If the faults in the isolated unit are corrected, the processing unit may be re-introduced to the computing system.
Functional reliability is often achieved by implementing redundancy in the system architecture whereby the level of redundancy is preserved without effects on the function being provided. Availability can be achieved by allocating extra hardware resources to maintain functional operation in the presence of faulted elements. There is a need, however, to minimize the hardware resources necessary to support reliability requirements and availability requirements in control systems.