1. Field
The present disclosure pertains to the field of information processing, and more particularly, to the field of error mitigation in information processing systems.
2. Description of Related Art
As improvements in integrated circuit manufacturing technologies continue to provide for greater levels of integration and lower operating voltages in microprocessors and other data processing apparatuses, makers and users of these devices are becoming increasingly concerned with the phenomenon of soft errors. Soft errors arise when alpha particles and high-energy neutrons strike integrated circuits and alter the charges stored on the circuit nodes. If the charge alteration is sufficiently large, the voltage on a node may be changed from a level that represents one logic state to a level that represents a different logic state, in which case the information stored on that node becomes corrupted. Generally, soft error rates increase as the level of integration increases, because the likelihood that a striking particle will hit a voltage node in a die increases when more circuitry is integrated into a single die. Likewise, as operating voltages decrease, the difference between the voltage levels that represent different logic states decreases, so less energy is needed to alter the logic states on circuit nodes and more soft errors arise.
Blocking certain types of particles that cause soft errors may be difficult, so data processing apparatuses often include techniques for detecting, and sometimes correcting, soft errors. These error mitigation techniques include redundancy. With redundancy, two or more hardware contexts execute identical copies of a program or instruction stream. Each hardware context may consist of any hardware capable of executing the instruction stream, such as a logical processor in multithreaded processor, a core in a multicore processor, a full processor in a multiprocessor system, or a full system including a processor, system memory, and possibly input/output (I/O) devices. The outputs from the two or more hardware contexts are compared, and, if they differ, an error handling mechanism may be invoked to determine if an error has occurred and/or handle the error.
In some implementations of redundancy, the two or more hardware contexts operate in lockstep, meaning that they each execute the same instruction in the stream simultaneously. In other implementations of redundancy, the two or more hardware contexts may execute the identical copies of the instruction stream, but not in lock-step or synchrony with each other, so that may each be executing a different instruction in the stream at the same time. Delivery of an input or an interrupt at a time when the hardware contexts are not in synchrony may cause an output from one context to differ from an output from another context, which may result in the error handler being invoked, even if the output mismatch did not result from an actual error.