1. Field
The present invention generally relates to error detection and correction mechanisms in computer memories. More specifically, the present invention relates to a technique that facilitates error detection and error correction after a failure of a memory component in a computer system.
2. Related Art
Computer systems routinely employ error-detecting and error-correcting codes to detect and/or correct various data errors which can be caused, for example, by noisy communication channels and unreliable storage media. Some error codes, such as SECDED Hamming codes, can be used to correct single-bit errors and detect double-bit errors. Other codes, which are based on Galois fields, can be used to correct a special class of multi-bit errors caused by a failure of an entire memory component. (For example, see U.S. Pat. No. 7,188,296, entitled “ECC for Component Failures Using Galois Fields,” by inventor Robert E. Cypher, filed 30 Oct. 2003, referred to as “the '296 patent.”) After a memory component fails, it is desirable to be able to detect and correct additional errors that arise during subsequent computer system operation. The technique described in the '296 patent can correct subsequent single-bit errors. However, this technique cannot be used to detect subsequent double-bit errors. It is also desirable to be able to reduce the number of additional “checkbits” which are used by this technique to provide such error correction and detection.
Hence, what is needed is a method and an apparatus for detecting and correcting errors that arise after a memory component has failed without the shortcomings of existing techniques.