“Soft error” is a term that is used to describe random corruption of data in computer memory. Such corruption may be caused, for example, by particles in normal environmental radiation. More specifically, for example, alpha particles may cause bits in electronic data to randomly “flip” in value, introducing the possibility of error into the data.
Modern computer processors have tended to have increasingly large caches, and correspondingly, an increasing probability of encountering soft errors. Methods of handling soft errors in caches are known. In some methods, the soft error is detected, but no steps are taken to recover from the error; instead, operations are simply shut down. For example, in known processors, parity checking is performed to detect soft errors in the instruction cache. If a soft error is detected, a “machine check error” is signaled to retirement logic, which uses this indication to shut down the processor on the next end of instruction, or divert to the non-recoverable machine check exception handler. Consequently, to continue working, the computer system must be re-booted.
In other methods, an effort is made to recover from soft errors without shutting down. One such known method uses ECC (error correction circuitry). ECC is additional hardware logic built into a cache; the logic is able to detect soft errors and execute a hardware algorithm to correct them. However, a disadvantage of ECC is that the additional hardware takes up space on the silicon and requires time to perform the needed computations, imposing further area and timing constraints on the overall design. Moreover, an additional cycle is usually added to the cache access time in order to accommodate the ECC's soft error correction logic, adversely impacting processor performance even when no soft errors are detected.
An approach is needed for handling soft errors in view of the foregoing considerations.