Digital computer devices store and transmit data in the form of bits of binary code in which each bit is typically represented in code as a "1" or "0". The devices occasionally produce an error in the storage, retrieval or transmission of data whereby a bit coded as a "1" is erroneously made a "0", or a bit coded as a "0" is erroneously made a "1".
There are many systems for detecting, and in some cases correcting, a data bit error utilizing error correction codes. The error correction codes are normally a set of parity bits that are determined by the data bits; therefore, an error in the data bits will result in a detectable change in the parity bits. The change in the parity bits indicates that an error has occurred in the data bits and, depending on the nature and extent of the error, the change in the parity bits may be analyzed to identify the error precisely. Once the error is identified precisely, it can easily be corrected; because a bit can only be "0" or a "1", an erroneous bit is corrected by simply inverting it so that a "0" becomes a "1" or a "1" becomes a "0".
In theory, any error in any combination of bits should be both detectable and correctable, but in practice the detectability and correctability of errors is limited by the extent of the parity code and the way in which the parity code is produced and manipulated. Ideally, a parity code should be simple and compact to avoid increasing the complexity or decreasing the speed of the system, and at the same time should detect and correct a high percentage of the possible errors or at least the errors most likely to occur.
Error correction codes can be categorized by the number of bits in error that can be corrected and the number of bits in error that can be detected but not corrected. The major classes of error correction codes and their acronyms are:
single bit error correcting (SEC)
single bit error correcting, double bit
error detecting (SEC-DED)
single bit error correcting, double bit
error detecting, single byte error detecting
(SEC-DED-SBD)
single byte error correcting, double byte error
detecting (SBC-DBD)
double bit error correcting, triple bit error
detecting (DEC-TED)
The term "byte" as used in the names of these various categories of error correction codes does not normally have the usual meaning of 8-bits. Instead, byte refers to the width of a memory chip, which is usually 4-bits in common DRAM memories.
The reason that many error correction codes are designed to detect an error in a 4-bit byte is that such a design allows detection of an error if the error is due to a failure in an entire chip or section of chip in a memory. However, because such error correction codes are primarily to detect chip failures, they are designed to detect only those 4-bit errors that are contiguous and in the same 4-bit "nibble" of memory. Thus, they do not correct or even detect those 4-bit errors that are in non-contiguous bits or those 4-bit errors that are in contiguous bits but that span adjacent nibbles. For example, in a 32-bit memory system where the bits are numbered from the most significant to the least significant, as in 31 to 0, the 4-bit nibbles would be 31-28, 27-24, 23-20, 19-16, 15-12, 11-8, 7-4 and 3-0. A 4-bit error in contiguous bits 3, 2, 1 and 0 would be detected because those bits are in a single nibble. However, a 4-bit error in contiguors 5, 4, 3 and 2 would not be detected because some of those bits are in one nibble (the nibble of bits 7, 6, 5 and 4) while others of those bits are in a different nibble (the nibble of bits 3, 2, 1 and 0).
This requirement in normal SEC-DED-SBD error correction codes that a detectable 4-bit error be in contiguous bits in a single nibble, prevents the detection of many common errors. For example, system errors such as multiple shorted or open traces on a circuit board, a failure in memory support circuitry, and bad cables, all are capable of producing 4-bit errors in contiguous bits that span two nibbles. Thus these errors are undetectable by common SEC-DED-SBD error correction codes.
Additional background on error correction codes can be found in Worley et al. U.S. Pat. No. 4,958,350 and Hillis U.S. Pat. No. 4,993,028, and in D.C. Bossen, "b-Adjacent Error Correction," IBM Journal of Research and Development, Vol. 14, pp. 402-208 (July 1970); S. M. Reddy, "A Class of Linear Codes for Error Control in Byte-per-package Organized Memory Systems," IEEE Transactions on Computers, Vol. C-27, pp. 455-458 (May 1978); C. L. Chen, "Error Correcting Codes with Byte Error Detection Capability," IEEE Transactions on Computers, Vol. C-32 pp. 615-621 (July 1983); C. L. Chen, M. Y. Hsiao, "Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review," IBM Journal of Research and Development, Vol. 28, pp. 124-134 (March 1984).