The invention relates to an efficient fault recovery system that allows a Field Programmable Gate Array user circuit to operation through faults using triple modular redundancy (TMR). The bitstream translation program (BTP) provides passive redundancy and supports the replacement of modules without interrupting the correct operation of the user circuit. The BTP correctly translates partial bitstreams and can be implemented on an embedded microprocessor to perform internal partial reconfiguration.
The basic FPGA is an integrated circuit consisting of logic blocks, interconnects, and I/O blocks. Logic blocks can be individually configured to perform various functions and are connected using programmable interconnects. An FPGA configuration, including the function each logic block implements and its connections, is determined when the FPGA is programmed. This programmable architecture mean today's FPGAs can implement large and complex functions.
Field programmable gate arrays are digital integrated circuits that can be programmed and reprogrammed post-fabrication by a user to implement a custom circuit. FPGAs are not only a valuable tool for rapid prototyping and testing, but also for implementing actual production systems. The submicron scale of improved FPGAs increased the number of transistors on each device making them more powerful. As the transistor size has been reduced, the current density in the devices has increased making them more vulnerable to errors. Gamma particle radiation may cause errors in the state of a transistor. In order to use FPGAs in space systems, fault-tolerance techniques to improve the reliability and dependability of FPGAs are needed. Fault tolerance has traditionally been provided by building redundancy into a design. In FPGAs, designs may be hardened by replicating components and using techniques such as Triple Modular Redundancy.
Fault tolerant circuits continue to provide dependable results even if a fault occurs during operation. In an environment where multiple faults can be expected such as space applications, systems may be required to tolerate multiple faults before the system malfunctions.
U.S. Pat. No. 7,216,277 “Self-Repairing Redundancy for Memory Blocks in Programmable Logic Devices,” Ngai et al. adds additional structure built into the FPGA to allow self-repair. The present invention alters bitstreams stored in memory to create a new bitstream which implement the faulty module in a new location to avoid the fault.
U.S. Pat. No. 6,973,608 “Fault Tolerant Operation of Field Programmable Gate Arrays,” Abramovici et al. uses an external controller to perform partial reconfiguration, allowing for a more robust method if incremental reconfiguration which attempts to minimize the effects of the reconfiguration on the performance of the FPGA. The present invention uses internal partial reconfiguration, relying on the microprocessor within the FPGA to calculate a new bitstream and apply the internal configuration access port. In the present invention the column based layouts provide fault tolerance allowing for continued operation. Instead of using incremental changes, a layout that allows for large sections of the FPGA to be reconfigured without disrupting the performance of the user's circuit is used.
Unlike previous fault tolerance approaches, the approach below includes detection, diagnosis and repair. To prevent faults from propagating through the system TMR masks faults and reconfiguration replaces modules that have suffered an error.