1. Field
The present invention generally relates to error-detection and error-correction mechanisms in computer memories. More specifically, the present invention relates to a computer system memory that supports probabilistic component-failure correction with partial-component sparing.
2. Related Art
Computer systems routinely employ error-detecting and error-correcting codes to detect and/or correct various data errors which are caused, for example, by noisy communication channels and unreliable storage media. Some error-detecting and error-correcting codes, such as single error correction, double error detection (SECDED) Hamming codes, can be used to correct single-bit errors and detect double-bit errors. Other codes, which are based on Galois fields, can be used to correct a special class of multi-bit errors caused by a failure of an entire memory component. (For example, see U.S. Pat. No. 7,188,296, entitled “ECC for Component Failures Using Galois Fields,” by inventor Robert E. Cypher, filed 30 Oct. 2003, referred to as “the '296 patent.”)
After a memory component fails, it is desirable to be able to detect and correct additional errors that arise during subsequent computer system operation. The technique described in the '296 patent can correct for subsequent single-bit errors. However, the technique described in the '296 patent cannot correct for subsequent errors which arise from an additional memory component failure.
Hence, what is needed is a method and an apparatus that can correct for errors caused by an additional memory component failure.