1. Field of the Invention
The invention relates to error detection in a computer system, and more particularly, to an architecture for a computer system having self-healing functionality for detecting, mitigating and storing information about digital logic errors.
2. Background of the Invention
Modern automobiles frequently are designed to utilize numerous electronic control units (“ECUs”). Some automobiles include more than seventy ECUs. An ECU is a processor-based system that controls one or more of the electrical systems or subsystems in an automobile. For example, ECUs control fuel injection and ignition timing functions in the internal combustion engine of most automobiles. These functions are critical to automobile operation, and their failure could have potentially life-threatening repercussions for the human users of the automobile.
A current trend is to design ECUs to use processors based on smaller geometry transistors. Processors based on smaller geometry transistors offer numerous benefits for ECU design. For example, these processors tend to be cheaper than previous processors, and thus allow ECUs to be produced at lower cost. Furthermore, these processors operate at higher speeds and have lower power dissipation requirements than other, more expensive, processors.
Unfortunately, there are negative consequences associated with processors based on smaller geometry transistors. One problem is that these processors are prone to transient errors. Transient errors are short term errors in a processor's digital logic. Transient errors are frequently caused by charged alpha particles that are emitted by the sun. These particles strike processor circuitry and generate changes in the processor's substrate. As a result of this substrate change, the processor suffers short term digital logic errors.
A second problem associated with processors based on smaller geometry transistors is that these processors are prone to persistent errors. Persistent errors are long term errors in a processor's digital logic. Persistent errors are frequently caused by metal migration and/or overheating of the processor's digital circuitry.
A third problem associated with processors based on smaller geometry transistors is that the individual component parameters such as transistor transconductance and leakage vary greatly with temperature and time resulting in reduced circuit tolerances and making operating conditions more susceptible to transient and permanent logic errors.
Thus, existing methods for correcting transient and permanent logic errors have proven deficient and undesirable for implementation in ECU design. These methods require triplication of all processor circuitry. As a result, these processors are significantly more complex and expensive, and therefore impractical for implementation in ECUs.