1. Field of the Invention
This invention relates to the field of integrated circuits. More particularly, this invention relates to the detection of single event upset errors within integrated circuits, such as those which arise from alpha particle strikes.
2. Description of the Prior Art
A Single Event Upset (SEU) is a change in state or a transient voltage pulse at sensitive nodes in integrated circuits, such as processors. Single Event Upsets occur due to high energy particle strikes on the silicon substrate of processors. Errors can occur due to SEU if it leads to a state flip in the sequential storage elements that determine the architectural state of the processor, such as the RAM arrays, the Register File and the architectural-state registers. In the context of this document, we consider an error due to SEU as a state flip in any state-holding element inside the processor.
The traditional technique of protection of RAM arrays and architectural state registers against particle strikes on their state holding nodes is through the use of Error Correcting Codes, or ECC. The conceptual representation of how ECC works is shown in FIGS. 9(a) and 9(b) of the accompanying drawings in the context of RAM arrays. The ECC block implements a standard algorithm on the block of data to be written in order to generate a, so-called, “code”. The code corresponding to the input data and the data itself are then both written into the memory. During a read operation, both the data and its code are read out. The ECC block then recomputes the code for the data and compares that to the code already read out. If the “recomputed” code does not match with the “read” code, then it is indicative of a state flip inside the RAM array. Such an event is flagged as an error.
With standard ECC algorithms, it is possible to correct an erroneous block of data, albeit with additional computational and storage overhead. If the error occurs in only a single bit of the data, then the overhead of correction using ECC is reasonable. However, the ability to detect and correct errors in multiple bits requires fairly sophisticated ECC algorithms with prohibitive computational and storage overhead. Hence, the typical practice is to use ECC for double-bit error detection and single-bit error correction.
Since Single Event Upsets inside memories typically cause single bit flips, ECC is a very effective error detection and correction technique for memory protection. For Single Event Upsets inside registers, another popular technique is the use of alternative flip-flop architectures that are designed for SEU robustness. The main concept in these known designs is to reduce the likelihood of state corruption in latching elements by one of increasing the overall capacitance on the latching node so as to decrease the likelihood of a state flip (the charge required to upset the state will be more with increased capacitance); overdriving the latching nodes through active devices to fight against state flips; and providing for redundancy in the latching elements and additional voting circuitry to choose between their outputs. These known circuit-based techniques for SEU robustness require duplication or triplication of state-holding elements within a flip-flop in order to reduce the likelihood of an error occurring due to an SEU. These known techniques disadvantageously increase the amount of circuit area needed and the power consumed.
All the techniques described above (ECC and the robust flip-flops) are designed for protection against particle strikes on the state-holding nodes of the processor. They are ineffective against particle strikes on combinatorial logic feeding into storage elements. Typically, particle strikes on combinatorial logic leads to a transient pulse on the incident node that can be captured at multiple storage elements in its fan-out, thereby corrupting their state.
A known technique for protecting the processor core against particle strikes on the combinatorial logic, and/or pipeline registers, is to duplicate the core and vote between the replicas. Thus, the identical processor cores run in lock-step and an error is flagged when their outputs disagree. However, this technique incurs significant power and area overhead (both area and power consumption are typically, at least doubled). There is also the added difficultly of suitably protecting the voting circuit.
Also known within the field of integrated circuits are the design techniques described in US Published Patent Application 2004-0199821 (the content of this published patent application is incorporated herein in its entirety by reference). The techniques described in this published patent application are sometimes referred to “Razor” and correspond to a design technique that allows voltage safety margins providing for uncertainties in silicon and ambient conditions to be eliminated or reduced. As a consequence, it is possible that timing violations can occur due to a combination of worst-case voltage, and temperature conditions. Razor provides a relatively low overhead detection and recovery mechanism to suitably flag the rare worst-case timing errors and recover correct state. In general, the Razor technique involves adjusting the operating parameters of an integrated circuit, such as the clock frequency, the operating voltage, the body bias voltage, temperature and the like so as to maintain a finite non-zero error rate in a manner that increases overall performance. Errors are detected in the processing stages by comparison of a non-delayed data value with a delayed data value. These data values are captured at slightly different times.
One mechanism for implementing the above described techniques is described in US Published Patent No. 2005/246613 the content of which is incorporated herein in its entirety by reference). This patent application describes a transition detecting flip-flop that can be used to detect timing errors when employing Razor techniques. Timing errors occur in a Razor-like system when propagation delay through combinatorial logic causes the data input to an edge-triggered sequential element, such as a flip-flop, to violate setup requirements. Data may change state in the setup or the hold window causing metastability in the sequential element, or it may transition after the positive edge such that the sequential element captures incorrect state data. Razor error detection is provided augmenting each timing critical sequential element with a transition detector. The transition detector flags any transition on the data input of the sequential element in the setup time window and during the positive phase of the clock as shown in the timing diagram of FIG. 8 of the accompanying drawings.