1. Field of the Invention
The field of the invention relates to the correction of soft errors or single event upset errors, such as those that arise from neutron or alpha particle strikes, within sequential storage circuitry of integrated circuits
2. Description of the Prior Art
A soft error or Single Event Upset Error (SEU) is a disturbance of a node within a circuit due to a hit on the silicon substrate by either high energy neutrons or alpha particles. Such a hit can produce a change of state or a transient voltage pulse at sensitive nodes in integrated circuits such as processors. Errors can occur due to SEU where it leads to a change in state in the state-holding elements that determine the architectural state of the integrated circuit, such as the RAM arrays, the register file, the architectural-state registers of a processor, or the flip-flops and latches within control logic. In the context of this document we consider an error due to SEU as a state flip in any state-holding element inside the integrated circuit.
The traditional technique for protecting such elements as SRAM arrays is by the use of error correction techniques. For example, error correcting codes or ECC can be used to flag errors and then correct them. Due to the high density of SRAM such techniques are very effective as generally SEUs inside such memories only cause a single bit flip, which is easy to correct with ECCs. However, due to the spatial distribution of flip-flops and latches throughout the integrated circuit and due to the fact that neutron and particle strikes in such distributed logic can cause separate bit failures within multiple of the sequential storage circuits, ECCs are not appropriate for correcting these errors.
With the process technology shrinking, the susceptibility of latches and flip-flops to soft errors to such SEUs is increasing and indeed at 45 nm it can be shown to be very similar to that of SRAMs. FIG. 1 shows that latches are just 20% more robust to such SEUs than SRAM at a nominal voltage.
With this increase in susceptibility of such devices to SEUs, many techniques have recently been proposed to address this problem. There is usually trade-offs between area, performance, power overhead and such additional robustness.
There are three main ways in which this problem can be addressed.
Firstly, the flip-flop/latch can be designed so that it is more difficult to flip the cell due to an SEU.
Secondly, a device that detects such a soft error can be designed. In such devices if the data gets corrupted due to an SEU, this is detected and hence the error can be rectified by the system by flushing the error form it.
Thirdly the device can be designed so that the output does not change even if an SEU flips one of the cells. Generally this involves adding redundancy inside the device to detect and correct the error.
The first technique has overheads such as resizing, adding capacitance to nodes and adding feedback associated with it.
The second technique involves adding redundancy to the system and comparing the stored value with the redundant value, thereby detecting the errors. An error detecting flip-flop is described in commonly assigned co-pending US application U.S. Ser. No. 12/078,189 filed on 27 Mar. 2008 the entire contents of which is hereby incorporated by reference.
There are two main problems with this technique.
Firstly the technique detects the error but does not correct it. Thus, a system level reset is needed to flush the error. This can be costly in terms of performance and power.
Secondly the technique can detect false positives. For example an SEU occurring in the redundant element will generate an error signal and there is no way to discriminate between real and false errors.
The present invention addresses both of these problems.
The third technique has the advantage of both detecting and correcting the errors avoiding the need for system level resets.
Such an error correcting technique is described in S. Mitra, N. Seifert, M. Zhang, Q. Shi and K. S. Kim, “Sequential Element Design with Built-In Soft Error Resilience,” IEEE Transactions on VLSI Systems, Vol. 14, No. 12, December 2006.
FIG. 2 shows an example of such a flip flop. The idea is to replicate the master and slave latches and use a C-element to compare the two outputs. When new data is latched at Latch PH2, it also gets latched in Latch LA. Similarly, when the data is latched in Latch PH1, it also gets latched in Latch LB. In other words, the latches LA/LB are shadows of the actual data in latches in latches PH2/PH1. A C-element has the property that it lets a new data propagate only if the two inputs (O1 and O2) match, otherwise it retains the old data. So, if there is a soft error in any one of the latches, the error is not propagated and the previous data is retained at the output by the keeper latch.
The design of FIG. 2 has five sequential elements (four master-slave latches and a keeper latch) in addition to an asynchronous C element.
An alternative error correcting technique is a triple module redundancy scheme where the storage elements are replicated three times and a voting is performed to output the data value held by the majority. Such a technique is described in A. Drake, A. J. Kleinosowski, A. K. Martin, “A Self-Correcting Soft Error Tolerant Flop-Flop,” 12th NASA Symposium on VLSI Design, 2005 and is illustrated in FIG. 3. The Drake scheme is an example of feedback correction, where the data is latched back in the design to do the correction.
In Drake et al. the clock signal is modified to lead the majority value back into the system in case of an SEU. Modifying the clock path can lead to other issues such as set-up violation. For example, where the SEU occurs near to the rising edge of the clock, the flip flop might not get the new data while it is busy correcting the error.
It would be desirable to have a sequential storage element that could both detect and correct SEUs while limiting additional overheads such as increase in area, power etc.