1. Field of the Invention
This invention relates to processes and microcontrollers for scrubbing, or removing errors from, a computer memory's content by re-writing it periodically with correct values.
2. Background of the Invention
A Field Programmable Gate Array (FPGA) is an integrated circuit which is configurable after it has been manufactured. Some integrated circuits are designed to be application specific (ASIC) and act as finite state machines which do not have the configurability of an FPGA. However, radiation sources are known to damage configurable integrated circuits.
Radiation in the form of gamma rays (photons, x-rays), electrons, protons, and heavy ions can damage the silicon of an integrated circuit. When radiation hits an integrated circuit, a bit flip can occur, thus destroying it's binary functionality, i.e. by changing a 0 to a 1 or a 1 to a 0. Changing a bit, changes the encoding of the 8-bit byte, changes the meaning of the logic, and renders the chip useless for it's intended function. Further, the lost electron can create a voltage where electrons gets pulled away into another component and destroy it. High energy ions can penetrate any thickness of material that is practical to put in space. If they penetrate a microcircuit, they leave a trail of holes and electrons in their wake. These charge carriers can be collected by one or more junctions and result in current transients that can produce several different, undesirable phenomena labeled Single Event Effects (SEE). These include:    1. SEU (single event upset)—a state change in a memory element (latch, flip flop, memory cell) that causes the information stored in the bit to be lost;    2. SET (single event transient)—a propagating transient with sufficient amplitude and duration to be mistaken as data, which may or may not result in information loss depending on its temporal relationship to the clock;    3. SEL (single event latchup)—a potentially damaging, high current state resulting from a four layer (PNPN) path being triggered into conduction by the current transient from the ionization strike;    4. SESB (single event snapback)—a potentially damaging high current state resulting from a 3 layer (NPN or PNP) being triggered into conduction by the current transient from the ionization strike;    5. SEDR (single event dielectric rupture)—catastrophic damage to the gate oxide due to a strike by an ionized particle.
Since all these phenomena are the result of a single ion strike, they are referred to as single event effects (SEE).
SEU's occur a number of times per day to semiconductor chips in space, but they can also occur near terrestrial radiation sources. Scrubbing is a term that denotes the process of removing errors from a memory's content by re-writing it periodically with correct values fetched from a master or “gold” copy.
The main weakness of a scrubbing solution is its own susceptibility to SEUs. An upset on the scrubber circuitry may result in writing wrong data in the configuration memory of the device. This is particularly dangerous as it may result in the destruction of the device.
FPGA's are susceptible to bit flips, but since they a reconfigurable, their configuration can be corrected by conforming their configuration to a master copy. Typical FPGAs comprise three types of configurable elements: configurable logic blocks (CLBs), input/output blocks, and interconnects. FPGAs that rely on static latches for their programming elements, also known as SRAM FPGAs, are reconfigurable, meaning they can be reprogrammed with the same or different configuration data; application specific integrated circuits (ASICs) cannot be reconfigured.
An FPGA's configuration memory is a special case since the data in the configuration memory dictates the logic the FPGA implements. Scrubbing of an FPGA is possible in Xilinx devices due to their glitchless configuration memory cell. This feature basically allows a write process to happen in a configuration memory cell without generating glitches in the underlying logic as long as no change in state is produced. Thus, periodic scrubbing (re-write of the configuration memory) will not affect the circuit the FPGA implements, and will only correct upset bits.
The glitchless characteristic of the memory cell have two exceptions: when Look Up Tables (LUTs) are configured in Shift Register Mode or as a RAM. If a frame that contains configuration bits for a LUT configured in any of these two ways is modified or just read back, then the LUT or shift register will be interrupted and it will lose its state. This has to be taken into consideration by the designer implementing a scrubbing solution. A workaround is to place the LUTs configured in this way into columns (frames) that will not be subject to scrubbing.
The configuration memory can be accessed within the FPGA fabric using the Internal Configuration Access Port (ICAP). This is useful for scrubbers implemented within the FPGA logic, thus labeled “self-scrubber”. The ICAP port can be instantiated with either an 8-bit bus or a 32-bit bus. Assuming a maximum operation frequency of a 100 MHz for this port, a scrubber could access the configuration memory at a maximum speed of 3.2 Gb/sec.
The configuration memory can also be accessed externally through the SelectMAP interface or the JTAG interface. The SelectMAP interface can be declared with either an 8-bit bus or a 32-bit bus. The JTAG interface is serial. Scrubbers using this interfaces are generally labeled as “external scrubber”.
The scrubbing process has also variants. The first solutions reported a method that is now known as “blind scrubbing”. In this case the configuration memory is re-written in a periodic fashion independently of whatever it has an upset bit or not. This approach is simple since it doesn't require the scrubber to perform a readback of the configuration memory content and a error detection procedure. The disadvantage of this approach is that it has significant overhead in terms of power consumption and performance, and is more susceptible to Single Event Functional Errors (SEFIs).
Later solutions implemented a readback-detect-error method. In this more selective approach the scrubber must read back the configuration memory and detect errors through some kind of error detection code. Scrubbing happens only when an upset is found in the configuration. This approach saves power compared to blind scrubbing and allows for scrubbing strategies where critical regions are differentiated from less important sections. In order to detect errors in the contents of any memory, some kind of parity bits or error correction and detection codes have to be embedded within the memory contents. In this approach, a code book is calculated using a parity or hash function (e.g. CRC) by reading back the memory contents (e.g. a FPGA's configuration memory). This code book is locally stored and it is used in subsequent iterations to compare against the codes generated by the memory contents read afterwards. A discrepancy will signal an upset occurrence.
Another variant on the scrubbing process is its granularity. First scrubbers performed scrubbing on the full device. However, these solutions add unnecessary power consumption, triple logic resources, and dictate area (on the chip) restrictions.
Later scrubbers allow selective scrubbing or frame-base scrubbing. Frame based scrubbing allows the user to implement different scrubbing strategies based on priority of the different parts of a system in an FPGA. Frame-based scrubbers have also the advantage that less data is written in the configuration memory overall. This reduces the possibility of a malfunctioning scrubber or a corrupted gold copy of the bitstream introducing errors in the configuration instead of correct them.
Accordingly, some of the problems in the prior art may be summarized as follows. Several scrubbing solutions have been reported in the literature, including Xilinx's application notes with recommended approaches. All these alternatives are either a microprocessor based scrubber (for instance Xilinx's reference designs based on PicoBlaze) or finite state machines with the sole functionality to read and/or write from the configuration memory. Processor-based scrubbers tend to increase the overall system's cross-section when implemented in a self-scrubbing configuration. State machine-based scrubbers, on the other hand, tend to have a rigid deployment, severely limiting their flexibility to support extra features such as reporting, statistics gathering or adaptation to new configurations.