As is known, the basic architecture of a computer includes a central processing unit, memory units, and buses that interface the memory units with the central processing unit. The memory units, which may be read-only memory (ROM) or random access memory (RAM), store programs, or algorithms, to be executed by the central processing unit (CPU). The memory units also store data, which is being operated upon by the CPU, and intermediate data, which is in the process of being operated on by the CPU.
The simplified description of a computer architecture presented above is common in any type of computer. For example, a personal computer, a workstation, or a main frame system, each include the above-described basic elements. In any of these computers, data stored in the RAMs, which may be dynamic RAMs (DRAM), may occasionally encounter a storage error. Storage errors within a DRAM are a somewhat natural occurrence in memory units due to the DRAM design. To overcome such storage errors, most memory units that have a DRAM architecture include an error correction circuit. The error correction circuit utilizes stored check bits and, when data is to be read from the memory unit, calculates new check bits based on the data being read and compares the new check bits with the stored check bits. When the stored check bits match the newly calculated check bits, the data is storage error free. If, however, the newly calculated check bits do not match the stored check bits, a storage error exists.
Depending on the severity of the storage error, it may or may not be correctable. The ability to correct a storage error depends on the error correction circuitry within the memory unit. For example, a circuit that includes a seven-bit error correction code (ECC) may detect two errors and correct one. If the number of bits used in the ECC is increased, the number of corrections may also be increased. Thus, using the seven-bit ECC, if more than one storage error is detected, at the bit level, the data includes an uncorrectable storage error.
When data, which includes an uncorrectable storage error, is transmitted to the CPU, the CPU attempts to utilize such data. When the CPU utilizes the data, a system error occurs which, in many cases, requires the computer system to be rebooted. Granted, the occurrence of uncorrectable storage errors happens very infrequently, in the range of once a month to once a year, but is still too often in many computer systems. For example, in heavy user systems, such as workstations, minicomputers, or mainframes, the rebooting process may take up to 24 hours. This, in many applications, is an unacceptable delay.
In addition to storage errors, the computer system described above is susceptible to transmission errors. As mentioned, the memory is coupled to the CPU via data buses. Occasionally, the data transmitted between the memory and the CPU is corrupted, thereby producing transmission errors. The CPU includes transmission error detection circuitry to identify such errors. Depending on the severity of the transmission error, the data may be corrupted beyond repair such that if the CPU utilizes the data, a system error will occur.
Therefore, a need exists for a method and apparatus that detects when an uncorrectable error is being presented to the CPU such that the CPU may appropriately handle the data to avoid system errors.