1. Field of the Invention
This invention relates in general to the field of computer systems and, in particular, to error detection and correction of transmitted data to and from a memory controller.
2. Discussion of the Prior Art
Computer systems generally consist of one or more processors that execute program instructions stored within a memory medium. This mass storage medium is most often constructed of the lowest cost per bit, yet slowest storage technology, typically magnetic or optical media. To increase the system performance, a higher speed, yet smaller and more costly memory, known as the main memory, is first loaded with information from the mass storage for more efficient direct access by the processors. Program instructions are read from main memory and program data may be read or written. Error detecting and correcting codes can be used to detect errors in the information as it is read from main memory and to correct the errors, if possible.
Parity checks and error correction codes (ECCs) are commonly used to ensure that data is properly transferred between system components. For example, a magnetic disk (non-volatile memory device) typically records not only information that comprises data to be retrieved for processing, but also records an error correction code for each file, which allows the processor, or a controller, to determine whether the data retrieved is valid. ECCs are also used with volatile memory devices, such as DRAM, and the ECC for data stored in DRAM can be analyzed by a memory controller which provides an interface between the processor and the DRAM array. If a memory cell fails during the reading of a particular memory word, due to some external force or internal deficiency, then the failure can at least be detected. ECCs can further be used to reconstruct the proper data stream.
Some error correction codes can only be used to detect single-bit errors; if two or more bits in a particular memory word are invalid, then the ECC might not be able to determine what the proper data stream should actually be. Other ECCs are more sophisticated and allow detection or correction of double errors, and some ECCs further allow the memory word to be divided into clusters of bits, or symbols, which can then be analyzed for errors in more detail, such as the ECC in commonly-owned U.S. Pat. No. 5,757,823, incorporated by reference herein. ECCs commonly use parity-check matrices to define the mathematical formula for deriving the check bits from the data bits.
For a memory array having a xe2x80x9cb-bit-per-chipxe2x80x9d configuration, the proper ECC is one that is capable of correcting all single symbol errors and detecting all double-symbol errors, where a symbol error is any one of the 20xe2x88x921 error patterns generated from a failure of an array chip. Using this single-symbol-correction double-symbol-detection, the memory may continue to function as long as there is no more than one chip failure in the group of array chips covered by the same ECC word. All errors generated from a single chip failure are automatically corrected by the ECC regardless of the failure mode of the chip. Sometime later, when a second chip in the same chip group fails, double-symbol errors may be present. These double-symbol errors would be detected by the ECC. To prevent data loss in this case, a proper maintenance strategy is executed to ensure the number of symbol errors does not accumulate beyond one.
In addition to data errors in computer systems, a separate class of errors based on failures in memory addressing also exist. Memory addressing errors can be caused by the same types of phenomenon that cause data errors internally in a memory chip. For example, these failures can cause data that was intended to be written to address location 0 to be written to address location 10 instead, resulting in the corruption of the proper data that was contained at address 10. A (78,66) ECC which corrects single-symbol errors and detects any combination of a single-symbol error and a single-bit error from a second symbol, as well as detects address errors, is discussed in U.S. Pat. No. 5,768,294.
It would be highly desirable to provide a single-symbol correcting double-symbol detecting ECC system which detects address errors and additionally provides the ability to detect all combinations of bit errors in the second error symbol above and beyond the capability presented in above-referenced U.S. 5,768,294.
It would further be desirable to provide a (146,130) single-symbol correcting double-symbol detecting ECC which detects address errors.
It would additionally be desirable to provide a (146,130) single-symbol correcting double-symbol detecting ECC having capability for detecting address errors that can be implemented using industry standard DIMMs, and, advantageously, be implemented in such a way to achieve the more desirable 8-bit symbol width even though the ECC code is designed for 4-bit symbols.
It is an object of the present invention to provide a (146,130) single-symbol correcting double-symbol detecting ECC which detects address errors and provides the ability to detect all combinations of bit errors in a second error symbol.
It is a further object of the present invention to, provide a (146,130) single-symbol correcting double-symbol detecting ECC designed to detect 4 bit symbols, and which may be implemented in such a way to correct an 8-bit symbol width.
In accordance with a preferred embodiment of the present invention, digital signal encoding and decoding is accomplished through the utilization of a parity check matrix and two parity bits generated from the system address bits of a computing system with thirty-six (36) symbols and four (4) bits per symbol. The method of encoding data symbols which are four bits in length comprises generating first the address parity bits from the system address bits. The two address parity bits are then used in conjunction with the data bits to generate sixteen check bits. The data bits and check bits are then stored in the memory array of the computer.
A similar, but reverse methodology, is used for decoding the electrical signals for correcting errors in symbols which are four (4) bits in length. First, the information pertaining to the previously stored data bits, as well as the check bits, are retrieved from the memory array. The address parity bits are generated using the system address of the data. Using the data retrieved from memory and the address parity bits, new check bits are generated to form a 16-bit syndrome vector. The 16-bit syndrome is calculated by the exclusive-or of the new check bits and the retrieved check bits and the syndrome vector is decoded to determine if any of the thirty-two data symbols, four (4) check symbols, or the two (2) address parity symbols are in error. If an error is detected, it may either be corrected or deemed uncorrectable, depending on the type of error.