1. Field of the Invention
This invention is related to processors and, more particularly, to error detection and correction in processors.
2. Description of the Related Art
Error codes are commonly used in electronic systems to detect and correct data errors, such as transmission errors or storage errors. For example, error codes are used to detect and correct errors in data transmitted via any transmission medium (e.g. conductors and/or transmitting devices between chips in an electronic system, a network connect, a telephone line, a radio transmitter, etc.). Error codes are also used to detect and correct errors associated with data stored in the dynamic random access memory (DRAM) of computer systems. One common use of error codes is to detect and correct errors of data transmitted on a data bus of a computer system. In such systems, error detection/correction bits, or check bits, are generated for the data prior to its transfer or storage. When the data is received or retrieved, the check bits are used to detect errors within the data (and possibly correct the errors, if the scheme supports correction).
Component failures are a common source of error in electrical systems. Faulty components include faulty memory chips or faulty data paths provided between devices of a system. Faulty data paths can result from, for example, faulty pins, faulty data traces, or faulty wires. Additionally, memory modules, which contain multiple memory chips, may fail. Circuitry which drives the data paths may also fail.
Another source of error in electrical systems are so-called “soft” or “transient errors”. Transient memory errors are caused by the occurrence of an event, rather than a defect in the memory circuitry itself. Transient memory errors occur due to, for example, random alpha particles striking the memory circuit. Transient communication errors occur due to noise on the data paths, inaccurate sampling of the data due to clock drift, etc. On the other hand, “hard” or “persistent” errors occur due to component failure.
Generally, various error detection code (EDC) and error correction code (ECC) schemes are used to detect and correct memory and/or communication errors. EDC and ECC schemes are generally referred to herein as error protection schemes, where a given scheme can be capable of only error detection, or both detection and correction. For example, parity can be used. With parity, a single parity bit is stored/transmitted for a given set of data bits, representing whether the number of binary ones in the data bits is even or odd. The parity is generated when the set of data bits is stored/transmitted and is checked when the set of data bits is accessed/received. If the parity doesn't match the accessed set of data bits, then an error is detected.
Other error protection schemes assign several check bits per set of data bits. The check bits are encoded from various overlapping combinations of the corresponding data bits. The encodings are selected such that a bit error or errors are detected, and in some cases the encodings are selected such that the bit or bits in error are identifiable so that the error can be corrected (depending on the number of bits in error and the error protection scheme being used). Typically, as the number of bit errors that can be detected and/or corrected increases, the number of check bits used in the scheme increases as well.