Computers have been used in digital control systems in a variety of applications, such as in industrial, aerospace, medical, scientific research, and other fields. In such control systems, it is important to maintain the integrity of the data produced by a computer. In conventional control systems, a computing unit for a plant is typically designed such that the resulting closed loop system exhibits stability, low-frequency command tracking, low-frequency disturbance rejection, and high-frequency noise attenuation. The “plant” can be any object, process, or other parameter capable of being controlled, such as an aircraft, spacecraft, medical equipment, electrical power generation, industrial automation, valve, boiler, actuator, or other device. A control effector is used to provoke a response by the plant. For example, when the plant is an aircraft, control effectors may be in the form of flight control surfaces such as rudders, ailerons, and/or elevators.
Various types of failures or faults may be encountered by conventional computing units found in control systems. A “hard fault” is a fault condition typically caused by a permanent failure of the analog or digital circuitry. For digital circuitry, a “soft fault” is typically caused by transient phenomena that may affect some digital circuit computing elements resulting in computation disruption, but does not permanently damage or alter the subsequent operation of the circuitry.
Soft faults may be caused by electromagnetic fields created by high-frequency signals propagating through the computing system. Soft faults may also result from spurious intense electromagnetic signals, such as those caused by lightning that induce electrical transients on system lines and data, buses which propagate to internal digital circuitry setting latches into erroneous states. In addition to lightning, other elements of the electromagnetic environment (EME) such as high-intensity radiated fields (HIRF), radio communications, radar pulses, and the intense fields associated with electromagnetic pulses (EMP) may also cause soft faults. Further, high-energy atomic particles from a variety of sources (e.g., atmospheric neutrons, cosmic radiation, weapon detonation, etc.) may deposit sufficient energy in the bulk semiconductor material of a digital device to set electronic circuits into erroneous states. With the advent of smaller integrated circuits running at high speeds, soft faults are becoming more common such as in the radiation environment encountered by aircraft traveling at high altitudes. In such an environment, computing circuits containing state-of-the-art digital devices may be more susceptible to failure.
In conventional control systems, various forms of redundancy have been used in an attempt to reduce the effects of faults in critical systems. Multiple processing units, for example, may be used within a computing system. In a system with three processing units, for example, if one processor is determined to be experiencing a fault, that processor may be isolated and/or shut down. The fault may be corrected by correct data (such as the current values of various control state variables) being transmitted (or “transfused”) from the remaining processors to the isolated unit. If the faults in the isolated unit are corrected, the processing unit may be re-introduced to the computing system along with the other two processing units.
Dissimilar computational redundancy is used to prevent the introduction of generic faults in control system architectures. Generic faults refer to common errors in system redundancies. Such errors can occur in the design and development of the hardware and software elements within general purpose computers that are used in control system architectures. As such, dissimilar computational redundancy would entail each redundant hardware element using a dissimilar microprocessor and each redundant microprocessor executing software (e.g., operating system, application, etc.) that was developed using a different programming language.
Other methods that have been used to help ensure the continued operation of control systems include the use of dissimilar technology, distributed computation redundancy, equalization, and mid-value voting. Each of these methods, however, generally requires at least one processing unit to remain operational at all times to preserve state variables. While the control systems may remain operational if all but one of the processing units experience a soft fault and the correctly-operating unit can be identified, the control system will not operate properly if all of the processors simultaneously experience soft faults. Similarly, if a lone properly-operating unit cannot be identified within the system, the system will not recover, as there would be no identifiable operating unit with correct values for all of the state variables to be transfused to the remaining units. In addition, because of the transfusion of state variables from other processing units, the system recovery may be relatively slow. It may therefore take an extended period of time for all processing units within the system to resume normal operation. In the meantime, redundant control is undesirably lost or degraded.
In the aerospace field, digital flight control systems are frequently interposed between the pilot and the flight control surfaces of an aircraft. Such systems may include fly-by-wire, auto-pilot, and auto-land systems. In a fly-by-wire system, in lieu of pilot controls being mechanically coupled (e.g., via cables or hydraulics) to the various primary flight control surfaces of the aircraft (such as the ailerons, elevators, and rudder), the position and movements of a pilot's controls are electronically read by sensors and transmitted to a computing system. The computing system typically sends electronic control signals to actuators of various types that are coupled to the primary flight control surfaces of the aircraft. The actuators are typically configured to move one or more control surfaces according to inputs provided by the pilot, or in response to feedback measured by a sensor on the aircraft. Failure of the control system could thus have catastrophic effects on the aircraft. Similarly, industrial, medical, or other systems may be gravely affected by certain control system failures.
In conventional flight control system (FCS) architectures, recovery from soft faults of FCS architectural elements, particularly in the flight control computer, is either not possible, has to resort to recovery attempts after a grace period of time, or requires recycling of power such as rebooting the computer. Any of these circumstances can impact the mean time between unscheduled removals (MTBUR) negatively. In addition, tight tolerance monitoring has been dependant on synchronous operations for tight tracking of redundant elements, and has been relatively federated and not easily scaleable.