Safety-critical control systems are increasingly being required, especially with the introduction of full by-wire braking and steering systems in vehicles. These systems need to be robust to controller faults wherein an error may result in loss of a critical vehicle function. Errors may be due to controllers that have temporary faults, or have been reset (e.g., due to a temporary power supply fault), or have drifted out of alignment because of a chaotic system, or wherein different sensor information has been caused by glitches at an input, or wherein there are temporary internal errors due to electromagnetic radiation in memory causing a state to be misrepresented.
One main requirement of these systems is to ensure no single point of failure exists, and as such multiple controllers and actuators are typically employed. Voting mechanisms are used to determine which output from the multiple controllers should be applied to control the system. A system with two controllers can be designed which compares the two outputs and shuts down when there is any discrepancy between the outputs. Three controllers are required for full redundancy with a voting mechanism normally selecting the median of the controllers' outputs for control. When a fault occurs in a controller in such a system, normally it is typically shut down for the remainder of the system operation and, subsequently reintroduced when the system is restarted or re-initialized. This may leave the system operating without sufficient fault-tolerance, even when the faults have subsided. This may not be an effective solution for cost-constrained systems, including automotive applications. Further, the reliability of the system significantly increases when a controller is recovered on-line rather than waiting until the end of an operating cycle to reintegrate it. This problem is especially important since studies have shown that transient faults are likely to occur 5 to 100 times more frequently than permanent faults.
The process of bringing a controller back on-line with correct functional states in real-time is referred to as reintegration. There exist numerous methods to reintegrate controllers. One approach uses hardware-assisted recovery techniques. However, such systems require additional hardware, focus on transient memory faults, and are not efficient in managing transient faults that temporarily cause an entire controller to reset. Another method to reintegrate the controllers requires transmission of the entire controller state information from one of the other controllers, typically in real-time. This approach has a number of potential drawbacks. Additional communication overheads are required to transmit the controller state information that may introduce bus or communication errors. The controllers need to be transitioned to a different mode to transmit this information which may trigger faults in the remaining working controllers at a time when correct operation is critical.
Therefore, what is needed is a method to reintegrate a controller into a control scheme that addresses the foregoing concerns.