The present invention relates to error detection and correction, and more specifically to a 9 bit error correcting code (ECC) for correcting single bit errors, detecting double bit errors, and detecting nibble errors in a 144 bit field (135 data bits).
A number of schemes exist for correcting errors and detecting corruption of data during transport, for example, data transmitted between agents over a network or between an external memory and a processor's internal memory cache. One example of a scheme for detecting errors in a data field is parity. When data is transmitted, a parity generator appends an additional parity bit to the data, for example a 9th bit (parity) to a 8-bit byte, such that the overall parity of the 9-bit field is odd or even. The receiving entity checks the parity of the 9-bit field and an error is detected if the parity does not match the predetermined parity (odd or even). This only works well for detecting single bit errors in a word.
Another example of an error detection scheme is a CRC (cyclic redundancy check) checksum. When receiving a data field for the first time, a CRC generator divides the data bits by a generator polynomial G(x). The remainder of the division is the CRC checksum, which is written in two bytes and appended to the data. When the bit field is retrieved another time, the complete sequence of bits, including the CRC bits, will be read by a CRC checker. The complete sequence should be exactly divisible by the generator polynomial G(x). If they are not, an error has been detected. One example of a standard generator polynomial is G(x)=x.sup.16 +x.sup.12 +x.sup.5 +1 which has the binary value 10001000000100001. This value has been defined by the CCITT and is often called CRC-CCITT. Implemented in hardware, the CRC check is an exclusive OR (XOR) of each bit position.
Closely related to the CRC are ECC codes (error correcting or error checking and correcting). ECC codes are sometimes referred to as EDC codes for error detecting and correcting. ECC codes are in principle CRC codes whose redundancy is so extensive that they can restore the original data if an error occurs that is not too disastrous. ECC codes are used, for example, for magnetic data recording with floppy or hard disk drives as well as for fail-safe RAM memory systems. A memory controller with embedded ECC logic, for example, is able to repair soft errors in DRAM chips caused by natural radioactivity in the air or tiny amounts of radioactive substances in the chip substrate. The ionizing effect of alpha-particles causes additional charges in the storage area of a DRAM memory cell which may distort the held value.
FIG. 1 depicts an example of a memory system 10 using embedded ECC logic (or CRC logic) for error detection and correction. Memory system 10 incudes bus interface 20, memory 25 and memory controller 30. Memory 25 is any memory device such as a floppy or a hard drive, for example. Memory system 10 is useful for transferring data between memory 25 and main memory or RAM (not shown), which is usually one or more banks of DRAM chips, for example. Data is transferred through controller 30 to and from bus interface 20 and controller chip 35. Bus interface 20 provides the connection to the main memory. Controller chip 35 determines the ECC (or CRC) bytes and provides any necessary formatting such as converting parallel submitted data into serial data and vice versa. ECC logic 40 (or CRC) generates and/or checks ECC bytes (or CRC bytes) being transmitted between bus interface 20 and memory 25. If an error is detected ECC (CRC) logic 40 generates an error detect signal to controller 35, and if the error is correctable, ECC logic 40 handles correction. Microprocessor 50 provides overall control, including synchronization, of controller chip 35 ECC (CRC) logic 40 and memory interface 60 of memory controller 30. Microcode ROM 55 provides the necessary instructions for microprocessor 50, and memory interface 60 provides the necessary interface to memory 25, depending on the memory type.
Conventional and modified Hamming SEC-DED codes (single error correction, double error detection codes) have been widely used to increase computer memory reliability. These codes generally require a large number of check bits and often require extensive circuitry to handle a complicated and lengthy decoding process. An improvement to the conventional Hamming SEC-DED codes that provides faster and better error-detection algorithm is given in Hsiao, M. Y., "A Class of Optimal Minimum Odd-weight-column SEC-DED Codes", IBM Journal of Research and Development, Vol.14, No. 4, July 1980, which is hereby incorporated by reference. The Hsiao algorithm demonstrates a new way of constructing a class of SEC-DED codes that use the same number of check bits as the Hamming SEC-DED code but which is superior in cost, performance and reliability.
What is needed in the art is an algorithm for detecting and correcting single bit errors, detecting double bit errors, and detecting multiple bit errors within a nibble for 135 data bits and 9 check bits.