1. Field of the Invention
This invention relates to error correction and more particularly, to error codes that correct bit errors in computer memory systems.
2. Description of the Relevant Art
Error codes are commonly used in electronic systems to detect and/or correct data errors, such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors within data transmitted via a telephone line, a radio transmitter or a compact disc laser. Another common use of error codes is to detect and correct errors within data stored in a memory of a computer system. For example, error correction bits, or check bits, may be generated for data prior to storing data to one or more memory devices. When the data are read from the memory device, the check bits may be used to detect or correct errors within the data. Errors may be introduced either due to faulty components or noise within the computer system. Faulty components may include faulty memory devices or faulty data paths between devices within the computer system, such as faulty pins.
Hamming codes are one commonly used error code. The check bits in a Hamming code are parity bits for portions of the data bits. Each check bit provides the parity for a unique subset of the data bits. If an error occurs, i.e. one or more bits change state, one or more syndrome bits will be asserted (assuming the error is within the class of errors covered by the code). Generally speaking, syndrome bits are generated by regenerating the check bits and comparing the regenerated check bits to the original check bits. If the regenerated check bits differ from the original check bits, an error has occurred and one or more syndrome bits will be asserted. Which syndrome bits are asserted may also be used to determine which data bit changes state, and enable the correction of the error. For example, if one data bit changes state, this data bit will modify one or more check bits. Because each data bit contributes to a unique group of check bits, the check bits that are modified will identify the data bit that changed state. The error may be corrected by inverting the bit identified to be erroneous.
One common use of Hamming codes is to correct single bit errors within a group of data. Generally speaking, the number of check bits must be large enough such that 2kxe2x88x921 is greater than or equal to n, where k is the number of check bits and n is the number of data bits plus the number of check bits. Accordingly, seven check bits are required to implement a single error correcting Hamming code for 64 bits data block. A single error correcting Hamming code is able to detect and correct a single error. The error detection capability of the code may be increased by adding an additional check bit. The use of an additional check bit allows the Hamming code to detect double bit errors and correct single bit errors. The addition of a bit to increase the data detection capabilities of a Hamming code is referred to as an extended Hamming code.
In a single error correction code, such as a Hamming code, multiple bit errors may cause one or more syndromes to be non-zero. However, multiple bit errors may erroneously appear as a single bit error in a different bit position. For example, in a single error correcting Hamming code with six check bits, one bit error may cause two check bits to change states. Another bit error may cause two other check bits to change state. Accordingly, if these two errors occur, four check bits will change state. Unfortunately, a one-bit error in still another bit position may cause those same four check bits to change state. The error correction procedure may assume the bit that affects all four check bits changed state and invert the data bit. If the check bit changes were actually caused by two bit errors, the error correction procedure has inverted a non-erroneous bit. Accordingly, the error correction procedure has created more errors, and may erroneously indicate that the data is error free.
The addition of an extended parity bit resolves this problem. When the data are read from memory, the check bits and extended parity bit are regenerated and compared to the original check bits and extended parity bit. If the regenerated check bits are different than the original check bits, the extended parity bit may be used to determine whether one or two bit errors occurred. If one error occurs, the regenerated extended parity bit will differ from the original extended parity bit. If two errors occur, the regenerated extended parity bit will be the same as the original extended parity bit. If one or more check bits change state and the regenerated extended parity bit is different, a single bit error has occurred and is corrected. Alternatively, if one or more check bits change state and the extended parity bit is the same, two bit errors are detected and no correction is performed. In the latter case, an uncorrectable error may be reported to a memory controller or other component within the computer system. It is noted, that more than two bit errors in a logical group is not within the class of errors addressed by the error correcting code. Accordingly, three or more errors may go undetected or the error correcting code may interpret the errors as a single bit error and invert a data bit that was not erroneous.
Parity checking is a commonly used technique for error detection. A parity bit, or check bit, is added to a group of data bits. The check bit may be asserted depending on the number of asserted data bits within the group of data bits. If even parity is used, the parity bit will make the total number of asserted bits, including the data bits and check bit, equal to an even number. If odd parity if used, the parity bit will make the total number of asserted bits, including the data bits and check bit, an odd number. Parity checking is effective for detecting an odd number of errors. If an even number of errors occurs, however, parity checking will not detect the error.
One common use of error codes is to detect and correct bit errors of data stored in a cache of a computer memory system. Generally speaking, a cache is a buffer between a processor and relatively slow memory devices. The cache is typically smaller and faster than main memory, and stores data recently accessed by the processor. Because of the repetitive nature of computer programs, the processor is more likely to access recently accessed information than other information in the memory. Accordingly, by storing recently used data in the faster cache, the average access time of data may be reduced. Reducing the access time of data reduces the time in which the processor is waiting for data from memory, which increases the overall speed of the processor.
Turning now to FIG. 1, portions of a computer system that implements a cache is shown. Computer system 100 includes processor 102, cache 104, memory controller 106, and system memory 108. Other portions of computer system 100 are eliminated for simplicity. Processor 102 is coupled to cache 104. Cache 104 is coupled to memory controller 106, which is in turn coupled to system memory 108. It is noted that the computer system of FIG. 1 is for illustrative purposes only. Other configurations of a processor, cache and system memory are contemplated.
Processor 102 requests data from system memory 108 by initiating a memory read request on processor bus 110. Cache 104 receives the memory read request and determines whether the requested data are stored in cache. If the requested data are stored in cache, cache 104 supplies the data to processor 102. Alternatively, if the requested data are not stored in cache, cache 104 initiates a memory read request to memory controller 106 to read the data. In one embodiment, memory controller 106 accesses the data from system memory 108 and stores the data to cache 104, which in turn supplies the data to processor 102. Alternatively, the data from memory controller 106 may be conveyed to processor 102 in parallel with storing the data to cache 104. When processor 102 writes to data stored in cache 104, several techniques for maintaining coherency may be implemented. For example, the data may be written to both cache 104 and memory 108, or the data may be invalidated in cache 104 and written to memory 108 only. The above described operation of computer system 100 is for illustrative purposes only and is not intended to limit the scope of the claims.
It is a common design goal of computer systems to reduce the number of check bits used to detect and correct errors. The check bits increase the amount of data handled by the system, which may increase the number of memory components, data paths and other circuitry. Further, the increased number of bits increases the probability of an error. Although the check bits may make an error detectable and/or correctable, increasing the number of data bits within the system increases the probability of an error occurring. For at least these reasons, it is desirable to decrease the number of check bits for a given level of error detection and/or correction.
The present invention reduces the number of check bits required to correct errors in a data block that includes a plurality of sub-blocks. Each sub-block includes a sub-block check bit that may be used to detect the presence of a bit error within the sub-block. A composite sub-block is generated, which is the column-wise exclusive-or of the bits of each sub-block. In other words, a first bit of the composite sub-block is the exclusive-or of all the bits in a first column position of the sub-blocks. The second bit of the composite sub-block is the exclusive-or of all the bits in a second column position of the sub-blocks, etc. In one embodiment, the composite sub-block is not stored, but rather used for computational purposes only. A plurality of composite check bits is generated to detect a bit position of an error within the composite sub-block. If a bit error within the data block occurs, the sub-block check bits may be used to detect in which sub-block the error occurred. The composite check bits may be used to determine which bit position of the composite sub-block is erroneous. The erroneous bit position of the composite sub-block also identifies the bit position of the erroneous bit in the sub-block identified by the sub-block check bits. Accordingly, the sub-block and the bit position within the sub-block may be detected by using the sub-block check bits and the composite check bits.
Broadly speaking, the present invention contemplates a method of correcting a bit error in a data block comprising: partitioning the data block into a plurality of sub-blocks, wherein each sub-block includes a plurality of bit positions; generating a first sub-block check bit for a first sub-block, wherein the first sub-block check bit is configured to detect an error within the first sub-block; generating a composite sub-block, wherein each bit of the composite sub-block corresponds to a bit position in the plurality of sub-blocks; generating composite check bits for the composite sub-block, wherein the composite check bits are configured to detect and locate a bit error in the composite sub-block; detecting an erroneous bit in the first sub-block using the first sub-block check bit and determining a bit position of the erroneous bit using the composite check bits; and inverting the erroneous bit.
The present invention further contemplates a computer memory that corrects a bit error in a data block. The computer memory includes one or more storage devices and an error correction circuit coupled to the one or more storage devices. The one or more storage devices are configured to store a plurality of sub-blocks of the data block. Each of the sub-blocks includes a plurality of bit positions. The error correction circuit is configured to receive the data block, to generate sub-block check bits for each of the sub-blocks, to generate a composite sub-block, and to generate composite check bits to detect a bit position of an erroneous bit within the composite sub-block. The sub-block check bits and the composite check bits are stored in the one or more storage devices. When a data block with an erroneous bit is read from the one or more storage devices, the error correction circuit uses the sub-block check bits to determine a sub-block that includes the erroneous bit and the composite check bits to determine a bit position of the erroneous bit within the sub-block that includes the erroneous bit.
The present invention still further contemplates a cache that corrects a bit error in a data block. The cache includes one or more storage devices and an error correction circuit coupled to the one or more storage devices. The one or more storage devices are configured to store a plurality of sub-blocks of the data block. Each of the sub-blocks includes a plurality of bit positions. The error correction circuit is configured to receive the data block, to generate sub-block check bits for each of the sub-blocks, to generate a composite sub-block, and to generate composite check bits to detect a bit position of an erroneous bit within the composite sub-block. The sub-block check bits and the composite check bits are stored in the one or more storage devices. When a data block with an erroneous bit is read from the one or more storage devices, the error correction circuit uses the sub-block check bits to determine a sub-block that includes the erroneous bit and the composite check bits to determine a bit position of the erroneous bit within the sub-block that includes the erroneous bit.