1. Field of the Invention
The present invention relates generally to circuitry which may be operated in environments whereby the circuitry is subject to single event upsets (SEU) and/or single event transients (SET) and, more specifically, to circuitry which is reconfigurable for adjusting the SEU/SET tolerance thereof.
2. Description of Related Art
The Field Programmable Gate Array (FPGA) is a type of programmable logic device (PLD). The FPGA may comprise an array of programmable tiles or programmable functional elements such as, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), look up tables (LUTs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), multi-gigabit transceivers (MGTs), and/or the like.
Another type of PLD is the complex programmable logic device, or CPLD. A CPLD may include two or more programmable functional elements connected together and also connected to input/output (I/O) resources by an interconnect switch matrix. Each programmable function block of the CPLD may include a two-level AND/OR structure similar to those used in programmable logic arrays (PLAs) and programmable array logic (PAL) devices. In some CPLDs, configuration data may be stored on-chip in non-volatile memory. In other CPLDs, configuration data may be stored off-chip in non-volatile memory, and then downloaded to volatile memory as part of an initial configuration sequence.
The above paragraphs describe a non-limiting list of various types of PLDs. PLDs may be utilized to form the electronic circuits for many different types of applications. A non-limiting list of applications may comprise telecommunications, networking, consumer, automotive, industrial applications, signal processing, LiDAR, image processing for crew display, pattern recognition, and the like. Future devices which utilize the present invention may be based on other technologies such as nanotechnologies
There is a growing use of PLDs in applications subject to radiation and/or other interference which may cause a single event upset (SEU) and single event transient (SET). For example, FPGAs are being utilized more often in space and military applications. Accordingly, there is an increasing need for efficient SEU/SET mitigation techniques.
SEU/SET mitigation methods for PLD circuits fall into two broad groups: manufacturer designed built-in circuit techniques and end-user designed firmware techniques. Built-in circuit techniques can be utilized to more quickly and more reliably provide SEU/SET mitigation. End-user designed techniques can be used to provide tailor made solutions which are more efficient with higher data capacity but may be less reliable due to greater difficulty in providing reliable SEU/SET mitigation.
Thus, presently available devices are normally committed to a fixed SEU/SET mitigation configuration, i.e., the entire device is either redundant or is not redundant. Built-in redundancy in the hardware provides high performance and greater assurance of reliable operation. However, many applications only need to be partly protected from SEU/SET and would preferably also permit high capacity if possible. Capacity, the amount of data flow per unit of time and/or the total algorithmic complexity, is reduced in proportion to the amount of redundancy utilized. Redundancy is usually provided as 2× or 3×, where 2× redundancy may require two data flow lines and 3× redundancy may require three data flow lines. For example, SEU mitigation of logic circuits may be accomplished by implementing triple modular redundancy (TMR) and other techniques. However, economical alternatives to TMR have long been sought.
Large-capacity, high-performance reprogrammable FPGAs marketed for use in space, and having latch up and total dose hardness, but without built in SEU/SET tolerance, have required designers to program SEU/SET mitigation into the FPGA as part of the application. Having the SEU/SET mitigation under user control allows partitioning of a design into protected and unprotected sections. However, there are many papers on the pitfalls of taking an FPGA and programming SEU/SET by adding redundancy through the firmware programming. Some problems can be very subtle. For example, there may be some underlying common source of error that is unknown due to the underlying structure of the chip. Moreover, the circuitry is expensive to test due to the requirement for testing within an environment with sufficient radiation to cause errors.
As an alternative to hardware techniques, redundancy may be provided via the software programming of the device, rather than in pre-wired hardware. However, software programming techniques may be less efficient, may take more time, and may intrude upon the application design. Typically, hardware redundancy has the advantage of being transparent to the application.
For SRAM-based FPGAs having their configuration stored in SEU susceptible SRAM, SEU mitigation requires protecting the configuration memory from the accumulation and indefinite retention of errors, usually by scrubbing. One purpose of scrubbing is that it protects the TMR mechanism, which would eventually fail due to multiple errors accumulating over time and affecting multiple voting domains. Without TMR, scrubbing reduces the time period (potentially indefinite) during which the device is functioning erroneously.
SEU/SET tolerances may be quantified in terms of error rates. Error rates from SEU/SET are often expressed as errors per bit-day. What constitutes a “lower” or “higher” SEU/SET tolerance is highly dependent on a subject environment. Stated otherwise, the error rate for a particular device will be a function of the environment in which it is operated, including the total amount of radiation, and the composition (e.g. protons, heavy ions, etc.) and energy of that radiation. Each type of radiation particle, at a specific energy, deposits a characteristic amount of energy per unit length of travel through silicon. This is called Linear Energy Transfer (LET), measured in units such as MeV·cm2/mg (energy lost by the particle to the material per unit path length MeV/cm divided by the density of the material mg/cm3). In ground tests using particle accelerators, circuits are characterized by the upset rate for a given particle flux at a given LET. Using models of radiation in various environments, such as Low Earth Orbit or Deep Space, error rates can be estimated from the test data. Occasionally data is directly obtained by placing test specimens in Low Earth Orbit, but rarely in other instances due to impracticality. What constitutes an “acceptable” error rate is heavily dependent on, for example, the application; the duration of use of the application; the size of the application; the criticality of the application; and the radiation environment. Errors per bit-day multiplied by the total number of bits in an application (including configuration bits if it is a PLD) give an estimate of aggregate error rate for the application. The inverse of error rate is Mean Time Between Failure (MTBF). As an example, if an application is designated as “safety or mission critical,” and should not experience an error, then MTBF should be much larger than the period of use of the application. Critical applications might be used only for seconds, as when a thruster is firing, or for the entire life of a multi-year deep-space mission, such as a human mission to Mars. If a 99.9999% probability of success is desired, then the MTBF would need to be 1000000 times the period of use. But for a non critical application, the MTBF might be far less than the period of use, according to the number of errors that were considered tolerable (i.e., a SEU/SET tolerance). When considering generally the difference between protected (also called mitigated) and unprotected (non-critical) applications, many orders of magnitude difference in error rates are implied, with the protected application having an MTBF similar to or much greater than the period of use, and the unprotected application (or one protected by other means) having an MTBF lower than the period of use.
The following patents describe some of the efforts made in the field of SEU/SET error mitigation:
U.S. Pat. Nos. 7,250,786 and 7,250,786, to S. Trimberger, issued Jul. 31, 2007 and Sep. 4, 2007, respectively, disclose a method and apparatus to provide triple modular redundancy (TMR) in one mode of operation, while providing multiple context selection during a second mode of operation. Intelligent voting circuitry facilitates both modes of operation, while further enhancing the robustness of the design when used in a TMR mode of operation. Various addressing schemes are provided, which allow dual use of the configuration data lines as selection signals using one addressing scheme, while allowing for dual use of the configuration address lines as selection signals using the second addressing scheme.
U.S. Pat. Nos. 7,310,759 and 7,512,871, to Carmichael et al., issued Dec. 18, 2007 and Mar. 31, 2009, respectively, disclose SEU mitigation, detection, and correction techniques. Mitigation techniques include: triple redundancy of a logic path extended the length of the FPGA; triple logic module and feedback redundancy provides redundant voter circuits at redundant logic outputs and voter circuits in feedback loops; enhanced triple device redundancy using three FPGAs is introduced to provide nine instances of the user's logic; critical redundant outputs are wire-ANDed together; redundant dual port RAMs, with one port dedicated to refreshing data; and redundant clock delay locked loops (DLL) are monitored and reset if each DLL does not remain in phase with the majority of the DLLs. Detection techniques include: configuration memory readback wherein a checksum is verified; separate FPGAs perform readbacks of configuration memory of a neighbor FPGA; and an FPGA performs a self-readback of its configuration memory array. Correction techniques include reconfiguration of partial configuration data and “scrubbing” based on anticipated SEUs.
U.S. Pat. No. 5,931,959, to K. Kwiat, issued Aug. 3, 1999, discloses computing modules which can cooperate to tolerate faults among their members. In a preferred embodiment, computing modules couple with dual-ported memories and interface with a dynamically reconfigurable Field-Programmable Gate Array (“FPGA”). The FPGA serves as a computational engine to provide direct hardware support for flexible fault tolerance between unconstrained combinations of the computing modules. In addition to supporting traditional fault tolerance functions that require bit-for-bit exactness, the FPGA engine is programmed to tolerate faults that cannot be detected through direct comparison of module outputs. Combating these faults requires more complex algorithmic or heuristic approaches that check whether outputs meet user-defined reasonableness criteria. For example, forming a majority from outputs that are not identical but may nonetheless be correct requires taking an inexact vote. The FPGA engine's flexibility extends to allowing for multiprocessing among the modules where the FPGA engine supports message passing. Implementing these functions in hardware instead of software makes them execute faster. The FPGA is reprogrammable, and only the functions required immediately need be implemented. Inactive functions are stored externally in a Read-Only Memory (ROM). The dynamically reconfigurable FPGA gives the fault-tolerant system an output stage that offers low gate complexity by storing the unused “gates” as configuration code in ROM. Lower gate complexity translates to a highly reliable output stage, prerequisite to a fault tolerant system.
U.S. Pat. No. 7,124,347, to W. Plants, issued Oct. 17, 2006, discloses a method for detecting an error in data stored in configuration SRAM and user assignable SRAM in a FPGA comprises providing serial data stream into the FPGA from an external source, loading data from the serial data stream into the configuration SRAM in response to address signals generated by row column counters, loading data from the serial data stream into the user assignable SRAM in response to address signals generated by row and column: counters, loading a seed and signature from the serial data stream into a cyclic redundancy checking circuit, cycling data out of configuration SRAM and user assignable SRAM by the row and column counters, performing error checking on the data that has been cycled out of the configuration SRAM and out of the user assignable SRAM by the cyclic redundancy checking circuit, and generating an error signal when an error is detected by the error checking circuit.
U.S. Pat. No. 6,963,217, to Samudrala et al., issued Nov. 8, 2005, discloses a method for reducing circuit sensitivity to single event upsets in programmable logic devices. The method involves identifying single event upset sensitive gates within a single event upset sensitive sub-circuit of a programmable logic device as determined by the input environment and introducing triple modular redundancy and voter circuits for each single event upset sensitive sub-circuit so identified.
U.S. Pat. No. 7,200,822, to K. McElvain, issued Apr. 3, 2007, discloses digital circuits with time multiplexed redundancy and methods and apparatuses for their automated designs generated from single-channel circuit designs. A digital circuit detects or corrects transitory upsets through time-multiplexed resource sharing. Time-multiplexed resource sharing is used to reduce the die area for implementing modular redundancy. This patent also discloses automatically synthesizing multi-channel hardware for time-multiplexed resource sharing by automatically generating a time-multiplexed design of multi-channel circuits from the design of a single-channel circuit, in which at least a portion of the channels are allocated for modular redundancy.
The above approaches do not solve the aforementioned problems. The complexity and difficulty of end-user-designed mitigation is encountered over and over through the life cycle of the application. Ideally the application could assume the hardware was performing correctly by means of redundancy built in to the hardware. However because some applications, such as signal processing, may better handle errors through their existing protocol techniques, it would be desirable to be able to select capacity over redundancy.
Those of skill in the art will appreciate the present invention that addresses the above and other problems.