Electronic data storage utilizing commonly available memories (such as dynamic random access memory (DRAM)) can be problematic. Specifically, there is a probability that, when data is stored in memory and subsequently retrieved, the retrieved data will suffer some corruption. For example, DRAM stores information in relatively small capacitors that may suffer a transient corruption due to a variety of mechanisms. Additionally, data corruption may occur as the result of hardware failures such as loose memory modules, blown chips, wiring defects, and/or the like. The errors caused by such failures are referred to as repeatable errors, since the same physical mechanism repeatedly causes the same pattern of data corruption.
A variety of error detection and error correction mechanisms have been developed to mitigate the effects of data corruption. For example, error detection and correction algorithms may be embedded in a number of components in a computer system to address data corruption. Frequently, ECC algorithms are embedded in memory controllers such as coherent memory controllers in distributed shared memory architectures.
In general, error detection algorithms employ redundant data added to a string of data. The redundant data is calculated utilizing a check-sum or cyclic redundancy check (CRC) operation. When the string of data and the original redundant data is retrieved, the redundant data is recalculated utilizing the retrieved data. If the recalculated redundant data does not match the original redundant data, data corruption in the retrieved data is detected.
Error correction code (ECC) algorithms operate in a manner similar to error detection algorithms. When data is stored, redundant data is calculated and stored in association with the data. When the data and the redundant data are subsequently retrieved, the redundant data is recalculated and compared to the retrieved redundant data. When an error is detected (e.g, the original and recalculated redundant data do not match), the original and recalculated redundant data may be used to correct certain categories of errors. An example of a known ECC scheme is described in “Single Byte Error Correcting-Double Byte Error Detecting Codes for Memory subsystems” by Shigeo Kaneda and Eiji Fujiwara, published in IEEE TRANSACTIONS on COMPUTERS, Vol. C31, No. 7, July 1982.