1. Field of the Invention
This invention relates to error correction and detection and, more particularly, to systems that employ error codes to detect and correct bit errors.
2. Description of the Relevant Art
Error codes are commonly used in electronic systems to detect and correct data errors, such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors of data transmitted via a telephone line, a radio transmitter, or a compact disc laser. Error codes may additionally be used to detect and correct errors of data stored in the memory of computer systems. One common use of error codes is to detect and correct errors of data transmitted on a data bus of a computer system. For example, error correction bits, or check bits, may be generated for data prior to transferring the data on a data bus. When the data are received, the check bits may be used to detect and correct errors within the data. Errors may be introduced either due to faulty components or noise within the computer system. Faulty components may include faulty memory devices, faulty bus interface units, or faulty data paths between devices within a system, such as faulty pins, faulty data traces, or faulty wires.
Hamming codes are a commonly used type of error code. The check bits in a Hamming code are parity bits for portions of the data bits. Each check bit provides the parity for a unique subset of the data bits. If an error occurs, i.e. one or more bits change state, one or more of the check bits will change state (assuming the error is within the class of errors covered by the code). Information regarding the particular check bits that change state may also be used to determine which data bit changes state, and to correct the error. For example, if one data bit changes state, this data bit will modify one or more check bits. Because each data bit contributes to a unique group of check bits, the check bits that are modified will identify the data bit that changed state. The error may be corrected by inverting the bit identified to be erroneous.
One common use of Hamming codes is to correct single bit errors within a group of data. Generally speaking, the number of check bits must be large enough such that 2kxe2x88x924 is greater than or equal to n, where k is the number of check bits and n is the number of data bits plus the number of check bits. Accordingly, seven check bits are required to implement a single error correcting Hamming code for 64 data bits. A single error correcting Hamming code is capable of detecting and correcting a single error.
In a single error correcting Hamming code, a set of syndrome bits is generated upon receipt of data. The syndrome bits are used to detect a single bit error. The syndrome bits further identify the position of the error, and thus can be used to correct the single bit error by inverting a value in the identified position. However, single bit error correcting Hamming codes fail to detect two bits errors which may occur during the transfer of data. Further, multiple bit errors may erroneously appear as a single bit error and may cause an incorrect identification of a position of the erroneous bit. Therefore, a bit that is not erroneous may be inverted. In the latter situation, the error correction procedure creates more errors and may erroneously indicate that the data is correct.
The error detection capability of the code may be increased by adding an additional check bit. The use of an additional check bit allows the Hamming code to detect two single bit errors and to detect and correct single bit errors. The addition of a bit to increase the error detection capabilities of a Hamming code is referred to as an extended Hamming code. The extended check bit is regenerated when the syndrome bits are generated. The regenerated extended check bit is compared to the original extended parity bit. If one or more syndrome bits are asserted and the regenerated extended parity bit is different than the original extended check bit, a single bit error has occurred and is corrected. Alternatively, if one or more syndrome bits are asserted and the regenerated extended parity bit is the same as the original extended check bit, two bit errors are detected and no correction is performed. In the latter case, an uncorrectable error may be reported to a bus interface unit or other component within the computer system. It is noted that more than two bit errors in a logical group is not within the class of errors addressed by the error correction code. Accordingly, three or more errors may go undetected or the error correction code may interpret the errors as a single bit error and invert a data bit that was not erroneous.
It is a common design goal in computer systems to reduce the number of check bits used to detect and correct errors. The check bits increase the amount of data handled by the system, which may increase the circuitry and data paths required for transferring the data. Further, the increased number of bits increases the probability of an error. Although the check bits may make an error detectable and/or correctable, increasing the number of bits within the system increases the probability of an error occurring. For at least these reasons, it is desirable to decrease the number of check bits for a given level of error detection and/or correction.
The problems outlined above are in large part solved by a technique for correcting single bit errors and detecting paired double bit errors in accordance with the present invention. In one embodiment of the present invention, a data block containing both data bits and check bits is transferred on a data bus. The data bus includes a plurality of wires, or signal paths. Paired bits of the data block are transferred on each of the wires (the same wire) of the data bus during different bus cycles. Check bits are assigned to the data block to detect errors that may occur during the transfer of the data block. Each check bit is calculated from a respective subset of data bits in the data block.
When the data block is received, syndrome bits are calculated. A syndrome bit is calculated using a corresponding check bit and a selected subset of data bits that is assigned to that check bit. The plurality of syndrome bits forms a syndrome bits vector. The number of syndrome bits in the syndrome bits vector is identical to the number of check bits. Thus, each syndrome bit in the syndrome bits vector corresponds to one of the check bits.
Upon the generation of the syndrome bits vector, if the syndrome bits vector has all zero bits, no error (within the class of errors covered by the error code) is detected and the data is provided to the receiving system without correction. If at least one syndrome bit in the syndrome bits vector is asserted and the syndrome bits vector is not identical to a special vector V, a single bit error is detected. The syndrome bits may be used to locate the bit position of an erroneous bit and the error may be corrected by inverting the bit in that bit position. If the syndrome bits vector is identical to the special vector V, a paired double bit error is detected and either an uncorrectable error is reported or a signal is generated to re-operate on the data block. Accordingly, the presence of two bit errors transferred on the same wire, or data path, may be detected without the need for an additional check bit.
The special vector V includes a number of bits that is equal to the number of check bits. At least two of the bits of V must have a value of one (binary one). Generally speaking, the computation of the syndrome bits may be represented with a syndrome bits assignment table. The syndrome bits assignment table contains a number of columns that is equal to the number of bits in the data block, and a number of rows that is equal to the number of check bits in the data block. Each column in the syndrome bits assignment table represents an assignment vector that corresponds to a bit position in the data block and indicates which syndrome bits the value in that bit position is contributing to.
Assignment vectors represented within the syndrome bits assignment table have the following attributes: (1) each assignment vector that corresponds to a particular check bit position in the data block has only a single bit asserted (i.e. a single xe2x80x9c1xe2x80x9d) corresponding to the syndrome bit associated with that particular check bit, (2) each assignment vector that corresponds to a data bit position in the data block has at least two bits asserted (i.e. two xe2x80x9c1sxe2x80x9d), (3) none of the assignment vectors can be identical to the special vector V, (4) each assignment vector that corresponds to a data bit position in the data block is unique; and (5) the XOR of any two assignment vectors that correspond to paired bit positions (i.e. two bits that are transferred by the same wire) in the data block result in the special vector V.
Broadly speaking, the invention contemplates a system for detecting and correcting errors in a data block. The data block includes data bits and check bits. The system comprises: a transmitter configured to generate the check bits and to transfer the data block, a bus coupled to the transmitter that includes a plurality of data paths for conveying the data block such that paired bits of the data block are conveyed on each of data paths, and a receiver coupled to the bus and configured to receive the data block and to generate a syndrome vector. The syndrome vector includes a plurality of syndrome bits and is used to detect and correct single bit errors and to detect paired double bits errors in the data block.