Transient faults due to neutron and alpha particle strikes are emerging as a significant obstacle to increasing processor transistor counts in future process technologies. Although fault rates of individual transistors may not rise significantly, incorporating more transistors into a device makes that device more likely to encounter a fault. As a result, it is expected that maintaining processor error rates at acceptable levels will require increasing design efforts.
Single bit upsets from transient faults have emerged as one of the key challenges in microprocessor design today. These faults arise from energetic particles, such as neutrons from cosmic rays and alpha particles from packaging materials. Transistor source and diffusion nodes can collect these charges. A sufficient amount of accumulated charge may invert the state of a logic device, such as an SRAM cell, a latch, or a gate, thereby introducing a logical fault into the circuit's operation. Because this type of fault does not reflect a permanent failure of the device, it is known as soft or transient error.
Soft errors are an increasing burden for microprocessor designers as the number of on-chip transistors continues to grow exponentially. The raw error rate per latch or SRAM bit is projected to remain roughly constant or decrease slightly for the next several technology generations. Thus, unless additional error protection mechanisms or usage of more robust technology (such as fully-depleted SOI), a microprocessor's error rate may grow in direct proportion to the number of devices added to a processor in each succeeding generation.
Soft errors in microprocessors and computer systems caused by high-energy particles can complicate, if not thwart, error detection/prevention techniques, such as redundant multi-threading (RMT) processors and computing systems. In general, RMT refers to a technique in which a program is executed at least twice by either the same or different instruction execution logic. As instructions in the program are executed or committed, each result is compared to see if they are the same. If one result is different, an error is deemed to have occurred and appropriate recovery techniques can be performed.
In the case of soft errors, however, it's often difficult to discern which of the results contain the error and which does not. One prior art technique to handle this problem is to execute program instructions an odd number of times, often on redundant processing logic, and deem the most commonly occurring result as the correct one. However, executing instructions in a program an odd number of times, and especially when using redundant hardware, increases system cost, power, and performance.