1. Field of the Invention
The present invention relates in general to the control system architecture, and more specifically, to fault tolerant systems.
2. Background of the Invention
Fault-tolerance is a property of a system that allows the system to continue operation in the event of a failure of some of its parts/components. Fault-tolerance is particularly sought-after in high-availability or life/mission-critical systems. Examples of such systems include the space shuttle, aircrafts, missiles, and others.
Fault tolerance is important for mission-critical systems because it diminishes the impact of adverse circumstances that might otherwise impair a system's functionality. It is especially helpful in those situations where an unexpected fault could jeopardize or severely impair the success of a mission. While defect reduction and the ability of a controller to respond to adverse situations are necessary components of reliability, these beneficial characteristics may not be sufficient to guard against in-service malfunctions, accidents, environmental anomalies or hostile action.
Conventional design techniques have tried to achieve fault tolerance in different ways:
Replication: This approach provides multiple identical instances of the same system, directing tasks or requests to all of them in parallel, and choosing the correct result on the basis of a quorum; and
Redundancy: This approach provides multiple identical instances of the same system and switches to one of the remaining instances in case of a failure (fall-back or backup).
Component redundancy when employed to achieve fault tolerance has shortcomings. For example, redundancy usually adds weight, space and complexity to a system and may not be a suitable alternative for missions where weight is important.
Often failure in mission critical systems occurs suddenly with very little time to react. The fault tolerance system needs to be agile to accommodate such situations.
Therefore, there is a need for a fault tolerant methodology that can be used to specifically target those situations in which component redundancy might not be a suitable alternative.