Despite the presence of error-control coding (ECC) in computer systems, it is still possible for uncorrectable errors to occur. For example, in many systems a two-bit ECC (2×ecc) error may not be correctable. If data containing such errors is consumed by the processor, it may cause spurious computational results, or it may even cause the operating system (OS) to go down, e.g., by means of a machine check abort (MCA).
One way of dealing with such uncorrectable errors is, upon detection of such errors by the processor, to assert a global MCA. This has the effect of bringing down the system, however. As a result, the availability and reliability of the computer system are reduced.
One refinement of this process is to detect uncorrectable data errors and to mark the data containing such errors. This technique is known as “data poisoning.” As a result, if the processor detects that the data it is about to consume has been “poisoned,” it can invoke an MCA to avoid the consumption of the poisoned data. While this provides a more convenient technique by which a processor can detect the presence of such uncorrectable errors, it provides only an incremental improvement in availability and reliability, as it is still necessary to bring down the system.