The present invention relates to the use of error correction codes (ECC) for detecting and correcting errors during data transport, and specifically to a method and apparatus that ensures faster data transport when there is no error, and which corrects any correctable errors.
A number of schemes exist for correcting errors and detecting corruption of data during transport, for example, data transmitted between agents over a network or between an external memory and a processor's internal memory cache. One example of a scheme for detecting errors in a data field is parity. When data is received, the parity of the data field is checked and an error is detected if the parity does not match the predetermined parity (odd or even). This works well for detecting single bit errors. Another example of an error detection scheme is a CRC (cyclic redundancy check) checksum. When receiving data, the complete data sequence, which includes CRC bits appended to the end of the data field, are read by a CRC checker. The complete sequence should be exactly divisible by a CRC polynomial. If they are not, an error has been detected. Implemented in hardware, the CRC check is an exclusive OR (XOR) of each bit position.
Closely related to the CRC are ECC codes (error correcting or error checking and correcting). ECC codes are sometimes referred to as EDC codes for error detecting and correcting. ECC codes are in principle CRC codes whose redundancy is so extensive that they can restore the original data if an error occurs that is not too disastrous. ECC codes are used, for example, for magnetic data recording with floppy or hard disk drives as well as for fail-safe RAM memory systems. A memory controller with embedded ECC logic, for example, is able to repair soft errors in DRAM chips caused by natural radioactivity in the air or tiny amounts of radioactive substances in the chip substrate. The ionizing effect of alpha-particles causes additional charges in the storage area of a DRAM memory cell which may distort the held value.
FIG. 1 depicts an example of a memory system 10 using embedded ECC logic (or CRC logic) for error detection and correction. Memory system 10 incudes bus interface 20, memory 25 and memory controller 30. Memory 25 is any memory device such as a floppy or a hard drive, for example. Memory system 10 is useful for transferring data between memory 25 and main memory or RAM (not shown), which is usually one or more banks of DRAM chips, for example. Data is transferred through controller 30 to and from bus interface 20 and controller chip 35. Bus interface 20 provides the connection to the main memory. Controller chip 35 determines the ECC (or CRC) bytes and provides any necessary formatting such as converting parallel submitted data into serial data and vice versa. ECC logic 40 (or CRC) generates and/or checks ECC bytes (or CRC bytes) being transmitted between bus interface 20 and memory 25. If an error is detected ECC (CRC) logic 40 generates an error detect signal to controller 35, and if the error is correctable, ECC logic 40 handles correction. Microprocessor 50 provides overall control, including synchronization, of controller chip 35 ECC (CRC) logic 40 and memory interface 60 of memory controller 30. Microcode ROM 55 provides the necessary instructions for microprocessor 50, and memory interface 60 provides the necessary interface to memory 25, depending on the memory type.
Modern CPUs use embedded ECC correction logic, such as ECC logic 40 in FIG. 1, in an attempt to detect and correct certain data errors occurring during data transport. Of particular importance in CPUs is the ability to detect and correct errors in data transported from an on- or off-chip memory cache to certain performance critical on-chip caches, such as prefetch, write, data and instruction caches. Such correction usually requires extra cycles in the data path, thus increasing data access latency and decreasing the CPU's performance. Accordingly, what is needed in the art is a method and apparatus for detecting and correcting errors that ensures that no extra latency is added to the data when there is no error, but which corrects correctable errors gracefully when there is an error.