1. Field of the Invention
The invention relates to fault detection and/or tolerance, and in particular, to techniques for detecting and/or mitigating the effects of transient soft errors using otherwise duplicative instructions in an instruction stream.
2. Description of the Related Art
It has long been recognized that electronic circuits are vulnerable to a variety of sources of transient “soft errors.” In contrast to hard errors caused by physical defects in a device or circuit, the term “soft error” generally refers to transient state errors rather than persistent errors resulting from device or circuit defects or damage. As a general matter, electronic circuits function by identifying small packets of charge as elemental bits of information. Accordingly, any perturbation of these small packets of charge may change the stored information. Sources of perturbation include electromagnetic energy, noisy power supplies and radiation. As device sizes become smaller, susceptibility to soft errors generally increases. In a typical semiconductor integrated circuit, soft errors may trace to environmental factors, externally- or internally-driven power supply perturbations, design factors including operation of an otherwise stable design outside its design envelope, etc.
One of the important sources of soft errors is the ionizing radiation associated with radioactive decay. The semiconductor industry has, over the years, struggled with the effect of trace levels of radioactive isotopes introduced by materials, packaging and manufacturing techniques. Other sources of radiation also play a role in soft error rates. For example, naturally-occurring background radiation (such as from cosmic rays) has been shown to contribute to soft error rates, particularly at high altitudes.
In general, two major techniques have been employed to address soft errors. First, manufacturing processes have been improved to greatly reduce the introduction of radioactive isotopes into production lines. Second, error detection and correction techniques have been introduced into circuit designs. For example, memory designs often incorporate parity or error correcting code (ECC) techniques to allow detection and/or correction of at least single-bit errors. Although ECC techniques can be very effective in mitigating soft errors, they are not without cost. In particular, ECC techniques require extra storage and logic to implement. Unfortunately, the portion of a semiconductor chip (e.g., that for extra memory cells and circuitry) employed to provide error detection and/or correction is not available for other purposes. As a result, given a fixed die size, a processor that employs ECC in its on-chip cache will necessarily have to make do with a smaller on-chip cache than one that does not. Accordingly, a need exists for techniques that allow detection and/or mitigation of soft errors without sacrificing memory or cache size and without special ECC circuitry. Alternatively, a need exists for techniques that allow detection and/or mitigation of soft errors in existing processor or system configurations that may not include facilities for ECC.