There is increasing interest in the use of field programmable gate arrays (FPGAs) for many space-based computing operations. Although FPGAs are generally slower than their Application Specific Integrated Circuit (ASIC) counterparts, cannot handle as complex a design and draw more power, they do offer several important advantages. These include a shorter time to market, ability to re-program in the field to correct errors and lower engineering costs. Since this is ideal for spacecraft applications, the space community has actively evaluated radiation effects for most new FPGAs being introduced. Unfortunately, while FPGAs offer several benefits for space-based electronics, they are generally sensitive to Single Event Effects (SEEs).
SEEs are caused by ionization as a consequence of the impact of a heavy ion (cosmic ray) or proton. The ionization induces a current pulse in a p-n junction. Single Event Effects include those effects which permanently damage circuitry, such as Single Event Latch-Up (SEL), Single Event Gate Rupture (SEGR), or Single Event Burnout (SEB), as well as “soft errors” referred to as Single Event Transients (SETs), which do not permanently damage circuitry.
SETs are caused by charged particles depositing charge on circuit elements through ionization. These deposited charges cause elevated local voltage levels in the circuit elements, which can non-destructively change the state of a bi-stable element. In a combinational logic element, the charge will leak away (typically over several hundreds of picoseconds) and the element will return to the correct state. However, when synchronous logic is disturbed by an SET on a clock edge, the temporarily incorrect logic value is latched into the register. This incorrect value can then propagate though the rest of the circuit. SETs that are latched into a register are called Single Event Upsets (SEUs).
In a satellite computer, for example, a bit-flip caused by an SEU could randomly change critical data, randomly change program data, or randomly change a register value. The changes can cause the software to perform unintended commands and thus cause the software to “crash”.
SEUs in an FPGA may affect the user design flip-flops, the FPGA configuration memory, as well as any hidden FPGA registers, latches, or internal state. Configuration memory upsets are especially problematic because such upsets affect both the state and operation of the design. Configuration upsets may perturb the routing resources and logic functions in a way that changes the operation of the circuit. The effects of single event upsets in the device configuration memory are not limited to modifications in the memory elements, but they may also produce modifications in the interconnections inside Configurable Logic Blocks (CLB) and among different CLBs, thus giving rise to totally different circuits from those intended.
Flash and Antifuse FPGAs have configuration memories that are insensitive to SETs, and any SET present in the user logic will be temporary. However, in FPGAs with volatile memories, in particular in SRAM based FPGAs, the major contributor of errors caused by SETs is due to configuration memory errors. Configuration memory errors in SRAM based FPGAs do not dissipate, but persist until a power reset loads new configuration memory on the FPGA or a scheduled configuration memory reset occurs. SEUs can become Single Event Functional Interrupts (SEFI) when they upset control circuits, such as state machines, placing the device into an undefined state, a test mode, or a halt, which would then need a reset or a power cycle to recover.
From the above it is apparent that some kind of single event upset mitigation scheme is crucial for the successful deployment of FPGAs and even ASICs for space-based applications. Single event upset mitigation can also be important for safety-critical terrestrial applications.
Double Modular Redundancy (DMR) SEU mitigation solutions rely on duplication of the combinational circuit and a comparison of the outputs of the duplicated circuits. Most DMR solutions, however, are only generally able to detect SEUs but not mask or correct them.
The most common mitigation scheme for correcting SEU errors in sequential circuits in orbit is Triple Modular Redundancy (TMR) plus scrubbing. TMR is a spatial redundancy technique that compares three signal values by means of a voting circuit, where the output is equal to the two inputs that agree. Any single event upsets will be removed through scrubbing and the bad state will either be masked or fixed by the triple modular redundancy (depending on the implementation). TMR is often exploited for hardening digital logic against single event upsets in safety-critical applications. As an instance, TMR is often exploited to design fault-tolerant memory elements to be employed in sequential digital logic. The main disadvantage of TMR is the excessive area overhead. The hardened design, with triplication of the combinational circuit and additional voting circuitry, can have between 4 and 7 times more area and power consumption than the original circuit, which limits its usage to reliability-critical applications.
Radiation tolerant FPGAs for space and military applications are available, but these tend to be orders of magnitude more expensive than their off-the-shelf counterparts. Furthermore, while radiation tolerant FPGAs or non-volatile FPGAs or ASICS are capable of masking the effects of single event upsets in the configuration memory, triple modular redundancy is generally still required in the user logic circuitry for critical applications.
In the applicant's own PCT application number PCT/IB2011/000640, a method and circuit for mitigating SEUs is presented which relies on double modular redundancy and a voter circuit between each pair of outputs, where the voter circuit is able to indicate the presence of an SEU if the two outputs are not identical. The voter outputs are all compared by a multiple input voter circuit and, if any one or more of the voter outputs indicates the presence of an SEU, the state memory latch elements (such as flip-flops) are all disabled until the presence of the single event upset has disappeared. In this way, the circuit “freezes” for the duration of the single event upset. While this method is effective in mitigating SEUs, the voters require a significant amount of additional circuitry and complexity, double modular redundancy is still required, and a small additional delay is introduced during the time in which the circuit is “frozen”. Furthermore, configuration memory errors are identified by detecting that the circuit has remained frozen for more than a predetermined time period and then reconfiguring the configuration memory. This wait time introduces much longer time delays in the case of configuration memory errors.
It would be advantageous to have a means for suppressing single event transients or glitches in digital electronic circuits that does not have the circuit area and power requirements of triple modular redundancy (or even of double modular redundancy), does not require expensive radiation tolerant circuitry, results in less time delay, requires less circuitry than the applicant's previous circuit and method, but which nevertheless offers substantial immunity against the errors caused by glitches and single event transients.