1. Field of the Invention
The present invention generally relates to data processing systems, and more particularly to a method and apparatus for detecting and correcting soft errors that arise in storage circuits such as latches.
2. Description of the Related Art
Data processing systems such as general-purpose computers or special-purpose devices have many different storage elements including memory arrays for mass storage of program instructions and operand data, and registers which temporarily store values used by execution units during the functional operation of the device. A typical microprocessor contains many storage elements that represent the current state or operating mode of the machine at any given time. These storage elements are very critical for correct operation of the processor and any error in the data stored in these elements can easily lead to machine failure. Microprocessors also use latches to store ancillary information, including scan latches that are employed in level-sensitive scan design (LSSD) type systems. These systems generally comply with the Institute of Electrical and Electronics Engineers (IEEE) standard 1149.1 pertaining to a test access port and boundary-scan architecture which allows information to be read from or written to the scan latches during operation of the data processing system.
Information stored in scan latches may include control, status or mode bits. For example, a data processing system might provide different mode configurations for clock control logic, and clock control latches can account for a significant portion of a microprocessor latch count. Microprocessors typically use control logic in local clock buffers to adjust the duty cycle and edge stressing of various clock pulses in the system and thereby meet the requirements of the local logic circuits. These clock buffer modes are set at system power-on using a scan controller, and often must maintain their logical value for days or months to ensure proper performance of the local logic circuits. However, these values can be upset during microprocessor operation due to a soft-error caused by stray radiation or electrostatic discharge. The upset may be correctable by scanning in a new value, but systems may only allow scanning in a limited manner such as at power-on, meaning that the system must be restarted if a clock control latch becomes incorrectly set.
Soft errors have become a primary reliability concern in scaled technologies. These errors are often caused by alpha particle strikes emitted from packaging materials or by neutrons originating from cosmic radiation. The soft-error rate (SER) of a data processing system can exceed the combined failure rate of all hard-reliability mechanisms (gate oxide breakdown, electro-migration, etc.). Built-in soft-error protection has thus become a necessity for meeting robustness targets in advanced computer systems. All storage elements (random-access memory, latches, etc.) are highly susceptible to soft-error induced failures, but memory arrays are usually protected by error-correction codes (ECCs) while latches are usually not so protected. Soft errors in latches are accordingly the primary contributors to overall system SER.
In one typical latch design data is stored in a cross-coupled inverter circuit. The state of this circuit is easily flipped by an alpha particle strike and in simple latches the data corruption occurs without detection. Once flipped, the state of the latch cannot be recovered. Combinational logic is typically more robust than sequential elements, i.e., static logic will eventually recover from an alpha strike, but a downstream error will arise if the temporary error induced in the logic arrives at a destination latch within the setup and hold time of that latch.
Conventional techniques for SER reduction in latches rely on three primary approaches: extra capacitances at the storage nodes, redundancy, and upset tolerance. Adding extra capacitance at a storage node improves soft-error resistance, but only by a minor amount. The extra capacitance also has the unfortunate side effect of introducing additional delay which can present significant difficulties for timing of the overall logic circuit. Robust latches have been designed with error-correction circuitry which relies on redundancy at either the latch level or the device (transistor) level. For example, two latches can be used with a comparator to provide error detection, and three latches can be used with a majority voting circuit for both detection and correction. The majority voting circuit indicates a set state for the redundant latch circuit based upon a majority of the latches being in the set state, or otherwise indicates a reset state. These latch designs reduce but do not eliminate the problem of upsets. For instance, in a redundant latch structure with a majority voting circuit that holds a logical state for an extended period, it is possible to have two separate upsets, i.e., two of the three latches being set to an incorrect value, which then generates an incorrect output at the voting circuit. As a related issue, full redundancy in latch designs may be too costly in terms of physical size (chip area), speed, and power consumption. In modern, leakage power-dominated designs, it becomes increasingly important to reduce or eliminate any unnecessary redundancies. Upset tolerant latches have been devised having more complex designs which are technically not redundant but still require many additional devices such as transistor pairs (p-type field effect transistors and/or n-type field effect transistors) having interwoven gates and output nodes. As with the redundancy approach, however, these designs can only recover from single event upsets (SEUs).
In light of the foregoing, it would be desirable to devise an improved circuit technique for detecting and correcting errors in storage elements such as latches which provides robust performance at a relatively low cost in terms of area, power and delay. It would be further advantageous if the circuit could provide soft-error immunity against multiple soft-error events.