The specification relates to a computer system. In particular, the specification relates to a fault-tolerant computer system.
Modern automobiles are frequently designed to utilize numerous electronic control units (“ECUs”). Some automobiles have more than seventy ECUs. An ECU is a processor-based system that controls one or more of the electrical systems or subsystems in an automobile. For example, ECUs control fuel injection and ignition timing functions in the internal combustion engine of most automobiles. These functions are critical to automobile operation, and their failure could have potentially life-threatening repercussions for the human users of the automobile.
A current trend in ECU design is to use processors based on smaller geometry transistors. Processors based on smaller geometry transistors offer numerous benefits for ECU design. For example, these processors tend to be cheaper than other processors, and thus allow ECUs to be produced at a lower cost. Furthermore, these processors based on smaller geometry transistors operate at higher speeds and have lower power dissipation requirements than other, more expensive, processors.
Unfortunately, there are numerous disadvantages associated with processors based on smaller geometry transistors. A first problem is that these processors are prone to transient errors. Transient errors are short-term errors in a processor's digital logic. Transient errors are frequently caused by charged alpha particles that are emitted by the sun. These particles strike processor circuitry and generate changes in the processor's substrate. As a result of this substrate change, the processor suffers short-term digital logic errors.
A second problem associated with processors based on smaller geometry transistors is that these processors are prone to persistent errors. Persistent errors are long-term errors in a processor's digital logic. Persistent errors are frequently caused by metal migration and/or overheating of the processor's digital circuitry.
A third problem associated with processors based on smaller geometry transistors is that the individual component parameters, such as transistor transconductance and leakage vary greatly with temperature and time resulting in reduced circuit tolerances, which makes operating conditions more susceptible to transient and permanent logic errors.
Thus, it is desirable to implement a fault-tolerant system that corrects the transient and permanent errors in the processors. However, existing systems for correcting the errors have been proven deficient and undesirable for implementation in ECU design because, for example, the existing systems take a substantial amount of time to recover the processors from errors resulting in a significant degradation for real-time performance. This excessive overhead time is intolerable and impractical for implementation in the ECUs.