Memory failures in digital systems can take many forms, but they all have one thing in common. They can result in catastrophic system failure, wreaking havoc in infrastructure such as telecommunications, information processing, traffic control, etc. Because of the potential serious consequences of memory failure, techniques have been developed to correct errors that develop in digital memory.
In some prior art memories, memory failures are recovered using parity checking or ECC (error correction code or error checking and correction) algorithms. With any algorithm, it is important that the algorithm be robust in the sense that it can recover from different type of memory errors. For example, with one type of error, memory I/O (input/output) ports can fail, corrupting an entire memory device and causing the loss of large amounts of data. Another type of memory failure may involve a single bit error, corrupting only one byte of data. Despite the disparity in the amount of data corrupted, either type of memory failure can cause devastating results in the system relying on the memory. Thus, the importance of robustness in the error correction technique used by a system.
Of known error handling techniques, parity checking is one of the simplest. It involves appending one or more parity bits to a data word. The parity bits are typically generated by performing an exclusive OR operation over the bits of a data word. In some parity checking implementations, a single parity bit is computed for every data byte by XORing the bits in the data byte. In other implementations, parity words are generated by performing a bitwise XOR operation on two or more data words. The parity word has the same bit width as the data words, and each bit in the parity word corresponds to data bits have the same position in the data words. Single-bit parity checking alone can only detect certain types of errors, i.e. single-bit error and odd numbers of bit errors. This limits the robustness and usefulness of simple parity checking in some memory applications.
Many ECC techniques can detect multiple bit errors, but can only correct a small number of bit errors. Often used with computer memory, ECC involves special circuitry and/or software to test data and assure their accuracy. Error control methods can be as simple as performing a cyclic redundancy check (CRC) in order to detect errors or adding multiple parity bits to both detect and correct errors. Double errors can be detected with more sophisticated techniques, such as Hamming code. In some fault tolerant memories, SEC/DED (Single Error Correct/Double Error Detect) ECC is used. However, when catastrophic memory failures occur, many known ECC schemes are generally ineffective in correcting the failures. Accordingly, there is a need for an improved memory error correction scheme.