In recent decades, microprocessor performance has been increasing exponentially due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, they are more susceptible to transient faults. Transient faults are caused by external particle strikes or process-related parametric variation. Transient faults do not cause permanent damage to a microprocessor, but may manifest as soft errors by altering signal transfers and stored values, resulting in incorrect program execution.
Software-only approaches to fault detection and recovery have shown to significantly improve reliability. These approaches are attractive to designers since they require no hardware modifications, making them significantly cheaper and easier to deploy. These techniques can also be used for systems that have already been manufactured and require higher reliability than the hardware alone can offer. This need can occur because of poor estimate of the severity of the soft error problem and uncertainty in the usage condition. Changes to the operating environment of the hardware can also have a noticeable effect on reliability.
Prior approaches to software-only error mitigation have relied primarily on static compilation techniques that require alterations to the compilation process and access to the application's source code. In order to use these techniques, a user would be required to collaborate with the software vendor to acquire the application source code, rendering these techniques impractical for many applications.