Computer systems generally require robust and reliable data storage. For example, some multiprocessor computer systems may have up to 10,000 dual in-line memory modules (DIMMs) for executing a complex task in real-time. A failure rate of one percent per day therefore could cause hundreds of consequential errors per day. Such an error rate is unacceptable for many applications
The art responded to this problem by developing memory that can be corrected during run time. One such widely used type of memory is known as “error checking and correcting memory” (“ECC memory”). Specifically, ECC memory implements algorithms that detect and correct memory errors by generating and processing specialized correction bits. For example, the well-known SECDED (single error correcting, double error detecting) algorithm generally is capable of correcting one bit errors, and detecting (but not correcting) two bit errors.
The single bit error correcting algorithms often provide sufficient results when used with “X1-type” memory chips (i.e., arrays of memory chips that each store one bit of a data word). Many current systems, however, use “X4-type” memory chips (i.e., arrays of memory chips that each store four bits of a data word). Accordingly, failure of a single X4-type memory chip can corrupt four bits of a single data word and thus, cannot be corrected by the SECDED algorithm. This deficiency is even more acute in computer systems having memory chips that store more than four bits of a data word.