1. Field of the Invention
The present invention relates to providing fault-tolerance within computer systems. More specifically, the present invention relates to a method and an apparatus for providing error correction within a register file of a central processing unit (CPU).
2. Related Art
Rapid advances in semiconductor technology presently make it possible to incorporate large register files onto a microprocessor chip. These large register files can be used to improve microprocessor performance. For example, the technique of vertical multi-threading relies on the replication of thread state, such as register files, to improve microprocessor performance. Hence, a four-way vertical multi-threaded processor requires four copies of the register file for efficient operation.
Unfortunately, large on-chip register files are susceptible to random bit errors. For example, assume each processor has four sets of register files, and each register file has 128 registers that are eight bytes in size. This means each processor contains 4×128×8=4K bytes of register file memory. If there are eight processors on a chip, this means each chip contains 32K bytes of register file memory that is susceptible to random bit errors.
One solution to this problem is to use error-correcting codes to detect and correct these errors. Semiconductor memories located outside a microprocessor chip often include additional space for storing a syndrome for each dataword. When a dataword is first stored into memory, a syndrome is calculated from the dataword, and this syndrome is stored along with the dataword in the memory. The dataword and the syndrome collectively form a codeword in the error-correcting code. When the dataword is subsequently retrieved from the memory, the syndrome is also retrieved. At the same time, a new syndrome is calculated for the retrieved data word. If the new syndrome differs from the retrieved syndrome, a bit error has occurred in either the dataword or the syndrome. In this case, information from the syndrome and the dataword is used to correct the bit error. Note that simply maintaining parity bits does not suffice to correct errors in a register file because there exists no backup copy of data within the register file that can be used to correct the error.
One problem with using conventional techniques to incorporate error-correcting codes into a register file is that extra time is required to perform the computational operations involved in detecting and correcting errors. This added delay, caused by longer cycle times or additional pipeline stages, can seriously degrade system performance because the register file is located on a main critical path in the computer system.
Hence, what is needed is a method and an apparatus for fixing bit errors in an on-chip register file without significantly degrading system performance.