1. Field of the Invention
The present invention relates to providing fault-tolerance within computer systems. More specifically, the present invention relates to a method and an apparatus for providing fault-tolerance for temporary results within a central processing unit (CPU) before the temporary results are committed to the architectural state of the CPU.
2. Related Art
Rapid advances in semiconductor technology presently make it possible to incorporate larger amounts of circuitry into a microprocessor chip. Unfortunately, memory elements within this circuitry are susceptible to random bit errors. Hence, as more circuitry is incorporated into a microprocessor chip, random bit errors are more likely to occur.
In order to remedy this problem, some microprocessor systems use error-correcting codes to protect data stored in cache memories within a microprocessor chip. Although cache memory accounts for a considerable portion of the memory within a microprocessor chip, many other memory elements remain unprotected.
Some of the remaining unprotected memory elements are located within an annex (also called a result buffer or working register file) in the microprocessor system. In some processors, an annex can include hundreds of registers, which makes it likely that a random bit error will eventually occur within the annex. An annex stores temporary results of computational operations that are waiting to be committed to the architectural state of the central processing unit (CPU). For example, the annex may store the result of an addition operation before the result is ready to be written to a destination register in the CPU. When the result is ultimately written to the destination register, which is located in a register file defined by the instruction set architecture, it becomes “architecturally visible.”
Note that there are many problems in using error-correcting codes to protect temporary results within an annex. The process of generating the error-correcting code, and the subsequent process of detecting an error, can take a significant amount of time. This makes it impractical to use error-correcting codes to protect temporary results, because the temporary results only exist for a short period of time. Moreover, error-correcting codes require additional circuitry, which can increase the size and complexity of a CPU.
Hence, what is needed is a method and an apparatus for fixing random bit errors that occur in temporary results without the above-described problems of using error-correcting codes.