1. Field of the Invention
The present invention relates to error detection and correction of data transferred between a CPU and system memory.
2. Description of Related Art
Errors can occur during the transmission of digital signals because of defective devices, faulty transmission lines, etc. Such errors will typically invert the signal(s) such that the data bit(s) is changed from a binary 1 to a binary 0, or from a binary 0 to binary 1. The device reading the data does not know what the signal is supposed to be, it merely assumes the data is correct. If the device is a central processing unit (CPU), the reading of invalid data could produce an incorrect output. An invalid output can be fatal. For example, a CPU that processes money transactions or bank accounts must be very reliable. The creation of even one error in the data stream into the CPU could result in a $1.00 deposit being logged in as a $1,000,000.00 deposit. It is therefore becoming increasingly important to have a means of detecting and correcting errors in data transferred to a CPU.
Error correction codes (ECC) are presently employed in the read/write operations of disk drives, particularly hard disk drives which supply a large amount of data. The very nature of disk drives and the means for reading the disk are susceptible to errors in the data stream. Over the years a variety of ECC schemes have been used to detect and correct data errors. One such scheme is called a block code. In a block code the data is assembled into an array having x number of rows and n number of columns. The array also includes a column of horizontal parity bits and a row of vertical parity bits. The horizontal parity bits are set (given a binary 1 or 0), so that the summation of the horizontal parity bit and the data bits within a single row equal either a binary 0 (even parity) or a binary 1 (odd parity). If even parity is used, then the summation of the horizontal parity bit and data bits for each row will be a binary 0. Likewise, the summation of the vertical parity bit and data bits for each column must also equal 0. The vertical parity bits are therefore set so that the summations are always equal to 0.
Most disk drive controllers have logic means that add the data and horizontal parity bits of each row to determine if there is a horizontal parity error. The controller also adds the data and vertical parity bits of each column to determine if there is a vertical parity error. For each single bit error there will be an accompanying horizontal and vertical error detection. The exact location of the data error can be located by looking at the row and column that produced the horizontal and vertical parity errors. Once located, the invalid data bit can be inverted to the correct state. A block code ECC is somewhat slow, because such a scheme requires the accumulation of an entire array of data which must be analyzed before being sent to the system. Although error detection and correction decreases the data rate from the disk, disks are read relatively infrequently so that a reduction in speed does not slow down the system. Thus for disk drives, the increase in reliability is considered greater than the penalty from the reduction in the data rate.
Current architectures often use Hamming codes to provide error correction between system memory and the CPU. A Hamming code protects a block of data, typically 8 bytes or larger. To protect an 8 byte data block, 8 Hamming code bits are calculated, where each code bit is calculated as the parity of a different group of 16 bits out of the 8 bytes of data. Thus, each single data bit is used to calculate at least two Hamming code bits. The Hamming code bits are stored with the data that is written from the CPU. When data is read from memory back to the CPU, a new set of Hamming code bits are calculated and compared with the stored original Hamming code bits. When a single bit error occurs in receiving data, some of the newly calculated Hamming code bits for the received data will be different from the stored Hamming code bits, and these different bits can be used to identify and correct the erroneous bit. To perform the error detection and correction, the entire block of data again must be read into the CPU. This requirement greatly reduces the speed of the CPU to memory interfaces.
Present architectures typically incorporate a cache between the CPU and system memory. Most recently there has been developed a dynamic random access memory (DRAM) device that contains a cache line. Such a device is sold by Rambus, Inc. The present Rambus chip incorporates a horizontal parity bit but not a vertical parity bit. Although horizontal parity bits can be used to detect an error, the exact location of the error cannot be determined. Therefore there is no way to correct the invalid data bit. Typically a CPU that detects a horizontal error disregards that byte of data. The CPU must then resubmit a request for the same data which slows down the system. It would therefore be desirable to have a method and apparatus that provides error detection and correction between a CPU and system memory, without significantly decreasing the data rate of the bus. It would also be desirable to have a DRAM with a cache line that stores and upgrades parity bits for subsequent error detection and correction.