1. Field of the Invention
This invention relates to error detection and correction and, more particularly, to detecting and correcting errors in systems processing data.
2. Description of the Related Art
Error codes are commonly used in electronic systems to detect and correct data errors, such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors in data transmitted via any transmission medium (e.g. conductors and/or transmitting devices between chips in an electronic system, a network connect, a telephone line, a radio transmitter, etc.). Error codes may additionally be used to detect and correct errors associated with data stored in the memory of computer systems. One common use of error codes is to detect and correct errors of data transmitted on a data bus of a computer system. In such systems, error correction bits, or check bits, may be generated for the data prior to its transfer or storage. When the data is received or retrieved, the check bits may be used to detect and correct errors within the data.
Component failures are a common source of error in electrical systems. Faulty components may include faulty memory chips or faulty data paths provided between devices of a system. Faulty data paths can result from, for example, faulty pins, faulty data traces, or faulty wires. Additionally, memory modules, which may contain multiple memory chips, may fail. Circuitry which drives the data paths may also fail.
Another source of error in electrical systems may be so-called “soft” or “transient errors.” Transient communication errors may occur due to noise on the data paths, inaccurate sampling of the data due to clock drift, etc. On the other hand, “hard” or “persistent” errors may occur due to component failure.
Generally, various error detection code (EDC) and error correction code (ECC) schemes are used to detect and correct memory and/or communication errors. For example, parity may be used. With parity, a single parity bit is stored/transmitted for a given set of data bits, representing whether the number of binary ones in the data bits is even or odd. The parity is generated when the set of data bits is stored/transmitted and is checked when the set of data bits is accessed/received. If the parity doesn't match the accessed set of data bits, then an error is detected.
Other EDC/ECC schemes may assign several check bits per set of data bits. The check bits are encoded from various overlapping combinations of the corresponding data bits. The encodings are selected such that a bit error or errors may be detected, and in some cases the encodings may be selected such that the bit or bits in error may be identifiable so that the error can be corrected (depending on the number of bits in error and the ECC scheme being used). For example, a commonly used EDC/ECC code is a single error correcting/double error detecting (SEC/DED) code, which as the name implies may detect two errors and correct one error. For example, Hamming codes are one commonly used error code. The check bits in a Hamming code are parity bits for portions of the data bits. Each check bit provides the parity for a unique subset of the data bits. If one data bit changes state, this data bit will modify one or more check bits. Because each data bit contributes to a unique group of check bits, the check bits that are modified will identify the data bit that changed state. The error may be corrected by inverting the bit identified to be erroneous.
When using error codes such as a Hamming code, as the number of bit errors that may be detected and/or corrected increases, the number of check bits used in the scheme increases as well. Generally speaking, the number of check bits must be large enough such that 2k-1 is greater than or equal to n, where k is the number of check bits and n is the number of data bits plus the number of check bits. Accordingly, seven check bits are required to implement a single error correcting Hamming code for 64 bits.
However, although increasing the number of check bits may increase the number of errors which are detectable and/or correctable, there may be drawbacks to this approach. For example, increasing the number of check bits may increase the amount of data handled by the system, which increases the number of memory components, data traces and other circuitry necessary to handle the increased data. Further, the increased number of bits increases the probability of an error. Thus, it may be desirable to increase the error correcting capability of a system without increasing the number of check bits of the error correcting code.