Encoding techniques are used in digital systems to provide for detection and correction of errors occurring during data processing. Such encoding techniques include, for example, the use of gray codes, Huffman encoding, or block codes. Block codes subdivide an input or source data stream into discrete blocks, and perform a particular encoding procedure on the input data. A fixed number of check digits or bits is added to the input data during message encoding which forms a transmittable codeword. These check bits are added to the input data so that errors occurring during transmission can be detected and possibly corrected. Upon receiving the transmitted codeword, a syndrome is calculated using a parity check matrix and the received codeword. The syndrome indicates which digit, if any, in the received codeword is in error and may be corrected.
One such block encoding procedure involves the use of Hamming codes. Hamming codes are binary codes which use predefined parity check matrices to provide single bit error correction capability. Hamming codes are generally not used to provide multiple bit error correction.
With respect to computer memory structures in modern computer systems, the use of Hamming codes to implement memory systems having single bit error correcting, double bit error detecting capabilities is nearly universal in the computer industry. For example, a 32 bit computer word can be used with a 7 bit Hamming codeword to correct all single bit errors of the 32 bit word, and detect all double bit errors of the 32 bit word. However, these memory systems have only single bit error correction capabilities.
Single bit, non-recurrent errors, also known as "soft errors", may be caused by relatively rare radiation effects, such as cosmic rays or trace radioactive elements in the material surrounding the memory device. Computing systems which operate in severe environments, such as outer space, can be subjected to random upset of memory bits, as well as total failures of individual memory devices. Without the shielding provided by the Earth's atmosphere, such upsets can be very common in outer space, potentially thousands per day in a 64 Mbit dynamic memory chip.
If more than two bits are in error in a codeword, a Hamming code may falsely indicate that 0 or 1 bits were in error, or may correctly indicate that there were multiple bits in error. However, an odd number of bits in error will generally cause a single bit (correctable) error indication or a multiple bit (uncorrectable) error indication. For example, if 5 bits were actually in error, a conventional error correction system based on Hamming codes may erroneously indicate that there was only 1 bit in error. Further, it is possible for an even number of bits in actual error, a conventional error correction system based on Hamming codes could falsely indicate that there is no error. Even if the Hamming code properly indicates the number of bits in error, the Hamming code can only be used to correct single bit errors.
What is needed is a fault tolerant memory system having reliable multiple bit error detection and multiple bit error correction capabilities for use in a computer system operable in severe environments.