1. Field of the Invention
This invention relates to processors and, more particularly, to logic error protection within the processor.
2. Description of the Related Art
Electronic components may fail in a variety of ways. Components that include memory arrays may have bit failures that may manifest as data errors. Logic circuits may have stuck-at bits and/or delay errors. The list goes on. Many errors may be caused by manufacturing defects. For example, during manufacturing, particulate contamination can cause hard errors to appear both immediately and during later operation. Many of these errors may be classified as hard errors since once a failure is detected the failure is persistent. Although many hard errors may be detected during manufacturing test and burn-in, some may be more latent, or are just not caught. Some types of errors may be more damaging than others. For example, silent errors such as those that occur from corrupt memory data can be catastrophic, as there may be no way to recover unless the error is detected and either corrected or a recovery mechanism exists. Accordingly, many error detecting/correcting mechanisms were developed. More particularly, error detecting and error correcting codes (EDC/ECC) as well as EDC/ECC hardware has been built into designs. Traditionally, these techniques have been used in microprocessor designs to protect against memory errors. Since most logic errors in the past were caught during manufacturing test and burn-in, the logic has been left largely unprotected.
Soft errors, on the other hand, may be intermittent and appear random, and as such can be difficult to detect and correct. In the past, soft errors were typically isolated to systems that used cables and boards with connectors and the like. Now however, as manufacturing technologies advance and device sizes get smaller (e.g., <90 nm), another source of soft errors is emerging particularly in metal oxide semiconductor (MOS) devices. These new soft errors may be caused by neutron or alpha particle bombardment and may manifest as memory errors due to direct memory array bombardment, or logic errors as a result of logic element (e.g., flip-flop) bombardment.
In devices such as microprocessors, which contain millions of transistors, soft errors, if not detected, may cause catastrophic results. As a result, detection methods such as conventional chip level redundancy have been developed that may detect the errors at the chip boundary. For example, two identical processor chips in a system may execute the same code simultaneously and final results of each are compared at the chip boundary. In many conventional chip level redundancy schemes, the detection of such errors cannot be corrected and the system cannot recover transparently since the errors have already corrupted the processor internal execution states, thus requiring a reboot. Thus, although the error may be caught, this type of arrangement may not be acceptable in high reliability and high availability systems.