The computer system can be divided into three basic blocks: a central processing unit (CPU), memory, and input/output (I/O) units. These blocks are coupled to each other by a bus. Input devices such as a keyboard, mouse, disk drive, analog-to-digital converter, etc., are used to input instructions and data to the computer system via an I/O unit. These instructions and data can be stored in memory. The CPU retrieves the data stored in the memory and processes the data as directed by the stored instructions. The results can be stored back into memory or output via the I/O unit to any output device such as a printer, cathode-ray tube (CRT) display, digital-to-analog converter, etc.
Computer memories usually fall under two classes: read-only memories (ROMs) and random access memories (RAMs). ROM is commonly used for storing information that is not subject to change. The CPU can only retrieve data that is stored at a particular address. On the other hand, RAM allows for the access of stored memory (reading) and has ability to alter the stored data (writing). RAMs fall under two categories. Static random access memories (SRAMs) store binary data by traditionally using flip-flop logic gate configurations. In contrast, dynamic random access memories (DRAMs) are charged-storage capacitors with drive transistors. One problem associated with both SRAMs and DRAMs is data integrity. Stored data can be corrupted due to faults in the SRAM, DRAM or in associated controller circuitry (hard errors). Additionally, over time, transient errors (soft errors) occur randomly. These soft errors cannot be predicted. They are mainly caused by alpha particle radiations which might discharge memory capacitors in DRAMs and cause flip-flops in SRAMs to change state. Soft and hard errors can also be caused by noise on the transmission media, shorted buses, power surges, faulty bus drivers, etc.
The occurrence of data corruption can have a significant detrimental impact on the overall performance of the computer system. A single error may not only lead to an incorrect result but can even cause the computer program to crash. Thus, methods have been developed to indicate the occurrence of errors.
A simple error detection (EDC) can be implemented by appending a single parity bit at the end of a byte (8 bits) of data. Even or odd parity can be specified. For even parity, the parity bit added to the dataword is set to "0" if the number of 1s in the byte is even. Otherwise, the parity bit is set to "1". For odd parity, the parity bit added to the dataword is set to "0" if the number of 1s in the byte is odd. Consequently, the total number of 1s in a byte, including the parity bit, should be even for even parity and odd for odd parity. A more sophisticated approach involves not only detecting the error but also correcting it as well. These are referred to as error correction codes (ECCs) An example of a commonly used ECC is the Hamming code or the modified Hamming code. The Hamming code principle consists of using several check bits to refine error detection to the point where it is possible not to just detect single bit errors but also to pinpoint their locations. Once the error bit is located, it can be corrected by complementing that bit. The number of check bits required to perform error correction depends on the length of the data string. Given n check bits, a data string having 2.sup.(n-1) -1 bits can be protected with double bit error detection and single error bit correction. By using ECC, the meantime between failures is extended, which leads to improved reliability to the overall computer system.
Most often, error correction codes are employed when data is transferred between a transmitting subsystem, such as an I/O device or a processor, and the memory of a receiving subsystem. ECC codewords (i.e., ECC parity bits derived from the original codewords prior to being transmitted plus the original codewords) are transmitted and stored in the memory of this receiving subsystem. The ECC codewords are stored at the address specified by the transmitting subsystem. When the memory of the receiving subsystem is subsequently accessed by the same address, the data portion of the accessed ECC codewords is utilized to generate a subsequent set of parity bits. The original parity bits are then compared to the subsequent set of parity bits by performing an exclusive-or (XOR) logic operation on the corresponding bits. The resulting code from the XOR operation is referred to as the error syndrome code. The error syndrome code indicates whether an error has occurred in the data or parity bit portion of the subsequently accessed ECC codewords. When the error syndrome code is decoded, the bit location in the ECC codeword in which the error has occurred is identified. Thus, the system detects if any errors have occurred in the transmission of the ECC codewords-including errors that may have occurred in the parity bits themselves.
In the prior art, the use of ECC codes including the data detection is performed before the data is used by the system. In other words, where data is transferred from the memory system to a device for use, the error detection and correction occurs before the data is actually utilized by the system. Therefore, there is a delay due to performing the error detection and correction that exists before data being transferred is ready for use. However, given a 16 megabyte DRAM memory system using 256 K DRAMs with a 0.30% per thousand soft error rate, a single-bit soft error occurs only once in approximately 24-48 days. Thus, the occurrence of an error is very rare. Because errors are so rare, the delay that is incurred before the data may be used is an undue delay. It is desirable to be able to receive data and use the data as soon as it is received while still compensating for any possible errors in the data that may be detected and corrected.
The present invention provides for performing error correction on data that is transferred to a device, such that the device is able to use the data immediately upon receiving it. The present invention provides for performing error correction on the data transferred from the level two (L2) cache memory.