This application relates generally to control systems and more specifically to a control system architecture using redundant processing units configured with rapid recovery elements.
Control systems incorporating digital computers have been used for several years. In various applications of these computerized control systems, it is very important to maintain the integrity of the data produced by the digital computers, because the loss of data could result in the loss of a large amount of money or even the loss of life. Examples of critical applications may be found in industrial, aerospace, medical, scientific research and other fields.
A conventional control system suitable for use in high integrity applications is shown in FIG. 1. As known from conventional control theory, a computing units system for a plant is typically designed such that the resulting closed loop system exhibits stability, low-frequency command tracking, low-frequency disturbance rejection, and high-frequency noise attenuation. The plant is any object, process or other parameter capable of being controlled, such as an aircraft, spacecraft, medical equipment, electrical power generation, industrial automation, valve, boiler, actuator or other device.
The computing units system may be any analog or digital device that provides a control for plant behavior to be within specified criteria. The computing units system output (represented by vector Oc(k)), in conjunction with any external commands (represented by vector C(k)) is provided to the plant as appropriate, and an output vector (Op(k)) corresponding to plant performance is provided to the computing units system as a closed-loop feedback signal. FIG. 1 also shows a vector of error inputs (E(k)), derived in a summation process of the computing units system output vector Oc(k) and the external command vector, that typically result in plant adjustments.
In the aerospace field, for example, digital control systems are frequently interposed between the pilot and the flight control surfaces of an aircraft. Such units may include fly-by-wire, autopilot, and autoland systems, for example. In a fly-by wire system, in lieu of a pilot""s controls being mechanically coupled (e.g., via cables or hydraulics) to the various primary flight control surfaces of the aircraft (such as the ailerons, elevators, and rudder), the position and movements of a pilot""s controls are electronically read by sensors and transmitted to a computing system. The computing system typically sends electronic control signals to actuators of various types that are coupled to the primary flight control surfaces of the aircraft. The actuators are typically configured to move one or more control surfaces according to inputs provided by a pilot, or in response to feedback measured by a sensor on the aircraft. Failure of the control system, then, could have catastrophic effects on the controlled aircraft. Similarly, industrial, medical and other systems may be gravely affected by certain control system failures.
Various types of failures or faults may be encountered by conventional computing units found in control systems. A xe2x80x9chard faultxe2x80x9d is a fault condition typically caused by a permanent failure of the analog or digital circuitry. For digital circuitry a xe2x80x9csoft fault,xe2x80x9d in contrast, is typically caused by transient phenomena that may affect some digital circuit computing elements resulting in computation disruption but that does not permanently damage or alter the subsequent operation of the circuitry. Soft faults may be caused by electromagnetic fields created by high-frequency signals propagating through the computing system. For example, soft faults may also result from spurious intense electromagnetic signals, such as those caused by lightning that induce electrical transients on system lines and data buses which propagate to internal digital circuitry setting latches into erroneous states. Additionally, radar pulses, and the intense fields associated with electromagnetic pulses (xe2x80x9cEMPxe2x80x9d) may also cause soft faults. Further, high-energy atomic particles (from a variety of sources, e.g., atmospheric neutrons, cosmic radiation, weapon detonation, etc.) may deposit sufficient energy in the bulk semiconductor material of a digital device to set electronic circuits into erroneous states. With the advent of smaller integrated circuits running at high speeds, soft faults are becoming more common, for example, in the radiation environment encountered by aircraft traveling at high altitudes. In such an environment, computing circuits containing state-of-the-art digital devices may be more susceptible to failure.
An erroneous result caused by soft faults may often be mitigated by rebooting the computer (e.g., by cycling the power off, then on again to initiate a power-on self-test). Such a procedure should result in the computer resuming proper operation. Rebooting may not always be available in digital computing systems that are used to control critical functions, however, such as in computing systems used in aircraft and other aerospace vehicles where state variables (e.g., control and logic state variables) and other parameters may not be readily recoverable by a conventional restart procedure. A control state variable in an avionics setting is typically a computed parameter that is developed over a period of time, and that therefore has an associated history based upon sensor or other data. Such variables are typically developed over long-term maneuvering or control of the plant. The loss of the control state variables associated with performing flight critical functions can be dangerous. For example, loss of control state variables during a landing sequence can cause an unpredictable system response that could result in a serious failure of the aircraft. In addition, a reboot procedure may require an undesirably large amount of time to complete, thus resulting in loss or degradation of plant control as the system reboots.
In the past, various forms of redundancy have been used in an attempt to reduce the effects of faults in critical systems. Multiple processing units, for example, may be used within a computing system. In a system with three processing units, for example, if one processor is determined to be experiencing a fault, that processor may be isolated and/or shut down. The fault may be corrected by correct data (such as the current values of various control state variables) being transmitted (or xe2x80x9ctransfusedxe2x80x9d) from the remaining processors to the isolated unit. If the faults in the isolated unit are corrected, the processing unit may be re-introduced introduced to the computing system along with the other two processing units. This process may be termed a xe2x80x9crecoveryxe2x80x9d process.
Other methods used to help ensure the continued operation of control systems include the use of dissimilar technology, dissimilar computation redundancy, distributed computation redundancy, equalization, and mid-value voting. Each of these methods, however, generally requires at least one processing unit to remain operational at all times to preserve state variables. While the above-described system may remain operational if all but one of the processing units experience a soft fault and the correctly-operating unit can be identified, the system will not operate properly if all of the processors simultaneously experience soft faults. Similarly, if a lone properly-operating unit cannot be identified within the system, the system will not recover, as there would be no identifiable operating unit with correct values for all of the state variables to be transfused to the remaining units. In addition, because of the transfusion of state variables from other processing units, the system recovery may be relatively slow to take place. It may therefore take several computing frames (which may take on the order of one half second or longer) for all processing units within the system to resume normal operation. In the meantime, redundant control is undesirably lost or degraded.
There is, therefore, a desire to have a more efficient system and technique for recovering from processor faults (such as soft faults) within a control system. More particularly, it would be desirable to have a more efficient system and technique consisting of coupling (through the use of mid-value voting and equalization) multiple processing units with the capability for rapid recovery such that effective redundancy can be preserved even if soft faults occur.
By incorporating computing units possessing processing units with the capability for rapid recovery, various embodiments of the invention use techniques such as mid-value voting, equalization, and the like to maximize the benefit of the redundancy available for the control system, thus resulting in a more stable and reliable system. An exemplary control system suitably includes a first computing unit and a second computing unit within a computing units system, with processing units that are configured to rapidly recover from soft faults. Each processing unit has the capability of running processes that generate a control signal to a plant effector. The control system may also include an adaptor coupling effector control signals, generated by the processing units within the computing units system, to an actuator or other device. The adaptor may be configured to detect when the performance (e.g., operator command unit performance) of processing units within, e.g., either the first or second computing units indicates a fault and to initiate a rapid recovery of the processing unit (within the computing unit), and, if appropriate, other units suffering from the fault (e.g., sensor units and/or operator command units). Additionally, the processing units within the first and second computing units may be configured to detect soft faults and to initiate a rapid recovery without input from the adaptor. A xe2x80x9cfast recoveryxe2x80x9d or xe2x80x9crapid recoveryxe2x80x9d process is one that allows a processing unit to return to operability in a relatively short amount of time, such as within one computing frame. Additionally, such a recovery may be independent from transfused data provided by other redundant computers.
In addition to rapid recovery of processing units from soft faults, an exemplary system architecture achieves xe2x80x9ctransparentxe2x80x9d recovery of processing units from soft faults such that full system redundancy can be restored. In other words, the recovery of a single processing unit or when appropriate, sensor or command unit does not adversely effect the operation of the control system and thus, the control function. Various computing systems associated with this invention may also provide additional benefits such as: high integrity fault detection; actuator position and control effector position monitoring where monitor thresholds can be time/magnitude adjusted; effector position equalization; and/or rapid redundancy (including dissimilar hardware or software) recovery from soft faults. Still further, an associated control system may include processing units with analytic redundancy as an additional fault tolerance element.