1. Field of the Invention
This invention relates to the detection of single event upset errors within sequential storage circuitry of integrated circuits, such as those which arise from neutron and alpha particle strikes.
2. Description of the Prior Art
A Single Event Upset (SEU) (also known as a soft error) is a change in state or a transient voltage pulse at sensitive nodes in integrated circuits, such as processors. SEUs occur due to high energy neutron or alpha particle strikes on the silicon substrate of processors. Errors can occur due to SEU if it leads to a state flip in the state-holding elements that determine the architectural state of the integrated circuit, such as the RAM arrays, the Register File, the architectural-state registers of a processor, or the flip-flops and latches within control logic. In the context of this document, we consider an error due to SEU as a state flip in any state-holding element inside the integrated circuit. As device geometries are shrinking, integrated circuits are becoming more prone to soft errors, as for example is discussed in the article “Logic Soft Errors in Sub-65 nm Technologies Design and CAD Challenges” by S Mitra et al, Design Automation Conference, 2005.
The traditional technique of protection of RAM arrays and architectural state registers against particle strikes on their state holding nodes is through the use of Error Correcting Codes, or ECC. In accordance with such a technique an ECC block implements a standard algorithm on the block of data to be written in order to generate a, so-called, “code”. The code corresponding to the input data and the data itself are then both written into the memory. During a read operation, both the data and its code are read out. The ECC block then recomputes the code for the data and compares that to the code already read out. If the “recomputed” code does not match with the “read” code, then it is indicative of a state flip inside the RAM array. Such an event is flagged as an error.
With standard ECC algorithms, it is possible to correct an erroneous block of data, albeit with additional computational and storage overhead. If the error occurs in only a single bit of the data, then the overhead of correction using ECC is reasonable. However, the ability to detect and correct errors in multiple bits requires fairly sophisticated ECC algorithms with prohibitive computational and storage overhead. Hence, the typical practice is to use ECC for double-bit error detection and single-bit error correction.
Since Single Event Upsets inside RAM memories typically cause single bit flips, and due to the high density of RAM memory in current generation integrated circuits, ECC is a very effective error detection and correction technique for RAM memory protection. However, the same technique cannot be applied to sequential storage circuitry such as latches and flip-flops, due to their spatial distribution throughout the integrated circuit, and due to the fact that neutron and particle strikes in such distributed logic (as opposed to RAM memory) can cause separate bit failures within multiple of the sequential storage circuits, which ECC cannot correct.
Over the last few years many techniques have been proposed for detecting soft errors in sequential storage circuitry, see for example the earlier-mentioned article “Logic Soft Errors in Sub-65 nm Technologies Design and CAD Challenges” by S Mitra et al, Design Automation Conference, 2005, the article “Measurements and Analysis of SER-tolerant latch in a 90-nm Dual VT CMOS Process”, by P Hazucha et al, IEEE Custom Integrated Circuits Conference (CICC) 2003, and the article “Robust System Design with Built-In Soft Error Resilience”, by S Mitra et al, IEEE Computer, February 2005.
Generally these approaches involve using alternative flip-flop architectures that are designed for SEU robustness. The main concept in these known designs is to reduce the likelihood of state corruption in latching elements by one of: increasing the overall capacitance on the latching node so as to decrease the likelihood of a state flip (the charge required to upset the state will be more with increased capacitance); overdriving the latching nodes through active devices to fight against state flips; or providing for redundancy in the latching elements and additional voting circuitry to choose between their outputs (such as for example in a “Triple Module Redundancy” (TMR) scheme where the latching elements are replicated three times and a voting is performed to output the data value held by the majority).
These known circuit-based techniques for SEU robustness require duplication or triplication of state-holding elements within a flip-flop in order to reduce the likelihood of an error occurring due to an SEU. These known techniques hence disadvantageously increase the amount of circuit area needed and the power consumed.
An SEU tolerant flip-flop has been discussed in co-pending commonly owned U.S. patent application Ser. No. 11/636,716, in connection with a design technique sometimes referred to as “Razor”. The Razor technique is described in US Published Patent Application 2004-0199821 and allows voltage safety margins providing for uncertainties in silicon and ambient conditions to be eliminated or reduced. In general, the Razor technique involves adjusting the operating parameters of an integrated circuit, such as the clock frequency, the operating voltage, the body bias voltage, temperature and the like so as to maintain a finite non-zero error rate in a manner that increases overall performance. Errors are detected in the processing stages by comparison of a non-delayed data value with a delayed data value. These data values are captured at slightly different times. US Published Patent No. 2005/246613 describes a transition detecting flip-flop that can be used to detect timing errors when employing Razor techniques. Timing errors occur in a Razor-like system when propagation delay through combinatorial logic causes the data input to an edge-triggered sequential element, such as a flip-flop, to violate setup requirements. Data may change state in the setup or the hold window causing metastability in the sequential element, or it may transition after the positive edge such that the sequential element captures incorrect state data. Razor error detection is provided augmenting each timing critical sequential element with a transition detector. The transition detector flags any transition on the data input of the sequential element in the setup time window and during the positive phase of the clock.
The SEU tolerant flip-flop described in the above-mentioned U.S. patent application Ser. No. 11/636,716 involves storing a sampled input signal within a sequential storage element, and then using combinatorial logic to detect as an error a transition of the signal stored by said sequential storage element occurring at a time outside a valid transition period. The technique recognises that a single event upset error can be detected at a circuit level using techniques similar to the above described Razor techniques. In particular, a sequential storage element for sampling an input signal and then storing that input signal as a stored signal will have a relatively short window of time in which it will be expected that a valid transition within the stored signal can occur. Transitions in the stored signal outside of this valid transition period can be detected as errors and single event upset errors have a high probability of manifesting themselves in this way.
Whilst such an approach can operate well in a specialised razor type system, it is not readily used in a more general system design. In particular the minimum delay constraint (the requirement for the input signal not to change during the positive phase of the clock) has to be met for all SEU tolerant flip-flops employing the above design, thereby putting a stringent requirement on the hold constraint. This has knock on effects, for example it is difficult to use a scan chain in association with such flip-flops, unless an asymmetric duty cycle clock is used. Accordingly whilst this approach can work well in certain systems, the constraints that are associated with it will not be acceptable in many systems and hence limit its general applicability.
Accordingly, it would be desirable to provide an improved technique for detecting SEUs in sequential storage circuitry of an integrated circuit, which can be used in a wide variety of systems.