The invention relates generally to error correction codes and, in particular, it relates to error correction codes optimized for memory chips that have multiple bit outputs.
In recent years, computer systems have been designed and built with ever increasing size, particularly in their memory capabilities. However, the cost of these larger computer systems have remained constant or even decreased because of the use of VLSI memory chips. More bits of memory can be packed on a VLSI chip for much the same cost as much smaller chips of previous generation memory chips. The increased memory size has not been accomplished without additional problems however. With the increased number of memory bits, reliability has become a severe problem. No longer can the occasional failure of a memory bit be allowed to cause the system to crash or become inoperative. The multiplicity of memory bits is too large and the overall probability of failure too high to permit the failure of a single bit or a single chip to render the computer system inoperative until the offending chip is replaced. Furthermore, a VLSI memory chip is more prone to errors because of the markedly reduced size of the individual components of the memory elements. That is, the increased complexity of modern memory systems has been purchased at the cost of decreased reliability of the memory components.
One solution to the reliability problem has been the increasing use of error correction codes (ECC). An example of ECC is provided by Kustedjo et al in U.S. Pat. No. 4,360,916. ECC provides the capability of correcting occasional errors that occur in memory chips. For instance, if Hamming ECC is used, eight additional bits of check code can detect and correct a single bit of error in a 32 bit data word. A Hamming code is very effective for isolated soft errors, that is, the random reversal of bits in a memory. The occasional occurrence of a bit reversal can be thus corrected and the probability of a bit reversal is so low that the joint probability of two reversals in a single 32 bit word becomes vanishingly small. The Hamming type of ECC is also useful for hard errors or permanent errors in which a isolated bit of memory becomes permanently inoperative. The code provides the capability of correcting this single error.
Error correction codes incur a relatively small penalty in memory space as long as only a single error is required to be corrected. For soft errors, this condition is generally satisfied. In large computer systems, separate memory chips are usually provided for each bit of a word. For instance, if a 64 bit word length is being used, there are 64 separate memory chips, one chip per word. If one chip goes bas, so that its output becomes unreliable, ECC in large systems can dramatically increase the overall reliability of the memory system against hard failures.
In smaller systems, however, the situation is somewhat different. Because the memory requirements of a smaller computer system are usually rather small, it is typical to use memory chips that have multi-bit outputs. That is, more than one bit of the typical data word is stored on a single memory chip. For instance, a total of 8 memory chips, each having a 4 bit output, can be used for storing a 32 bit data word. The data space may be small enough and the individual memory chips big enough that a single bank of 4-bit wide data chips may be sufficient to support the memory requirements of the system. The difficulty with ECC in such a system is that one entire memory chip may go bad, with the result that the 4 bits contributed by that chip become unreliable. The error correction code required to provide reasonable reliability in such a system thus must be able to correct at least 4 errors in a 32 bit word. Although such error correction codes are available, they incur a substantial memory size penalty. Needless to say, smaller computer systems are cost sensitive and additional memory necessary to support effective error correction for multi-bit output memory chips is inconsistent with the design philosophy of a smaller computer system. Furthermore, error correction codes that can correct large number of errors in a single data word usually incur a computational penalty that would degrade the performance of the computer system.