Among the existing reconfigurable hardware technologies, Field Programmable Gate Arrays (FPGAs) are a breakthrough in terms of performances and flexibility. They constitute a tradeoff between ultra-flexible but inefficient processors and powerful but fixed-functions Application Specific Integrated Circuits (ASICs). Moreover, reconfigurable hardware is one of the leading choices when one wants to implement a given function in hardware, because of the prototyping costs induced by an ASIC: millions dollars for the masks, the associated tools and the engineering team.
FPGAs can be seen as a “sea” of logic elements glued together by a dense routing network assuring the data links. Because of the overhead needed by fully programmable logic elements to undertake the implementation of a complex function, FPGA manufacturers often add dedicated “black boxes” in the circuit. These cells allow performing more efficiently specific tasks such as arithmetic calculations (multiplications, additions, etc.) signal processing or interfacing the chip with incoming and outgoing signals.
Logic elements, as well as complex cells and routing network, can be electrically programmed thanks to memory cells tied to each transistor realizing a connection on the logic fabric. Although they are evenly positioned on the circuit, all of these memory cells can be seen as a single memory layer. This configuration memory is intended to receive a bitstream, the set of each bit determining the state of every configurable element of the FPGA.
FIG. 1A is a simplified view of the logic fabric of an “island-style” FPGA, indicated as a whole by reference PLC, comprising an array of logic blocks LB surrounded by an interconnection network ICN, the latter being constituted by interconnecting wires IW and four-way configurable switches SW (replaced by three-way switches on the edges of the FPGA). Interconnecting wires IW, crossing the circuit from edge to edge, are grouped in “X” (horizontal) and “Y” (vertical) channels comprising W≧1 (and preferably W>1) wires each, which cross forming a square-mesh network.
Each line (resp. column) of the logic fabric comprises alternatively X (resp. Y) routing channels and logic blocks LB. A given routing channel is made of multiple subsequent segments (one horizontal segment is shown in bold on FIG. 1A) and each segment runs from one configurable switch to another.
A switch SW is disposed at each crossing of two wires; a group of “W” switches disposed at the crossing of two W-wire channels form a “switchbox” SWB (in some embodiment, switchboxes could provide only part of the possible connections between wires). Each four-way switch comprises six independent transistors, while only three transistors are required to implement a three-way switch. W switches at the intersection of an “X” and a “Y” channel form a “switchbox”. Additional switches—not represented on the figure—ensure the connections between input/output ports of the logic blocks LB and the interconnecting wires IW.
Although this does not appear on FIG. 1A, a fully reconfigurable interconnection network can account for up to 90% of the FPGA surface, compared to 10% for the logic blocks.
FIG. 1B illustrates a possible implementation of a reconfigurable logic block LB, which is a particular case of reconfigurable hardware block used in FPGA.
The logic block LB of FIG. 1B has four input ports IN and one output port OUT, each port being 1-bit wide. A four-input look-up table 4-LUT allows implementing any combinatorial logic function of the four input bits. The input port of a D-type flip-flop DFF is connected to the output port of the look-up table. A multiplexer MUX, driven by a command signal SEL, is used to provide, at the output port of the logic block, either the output signal of the flip-flop or that of the look-out table. CK is the clock signal of the flip-flop.
A simple FPGA may only comprise interconnection networks and logic blocks of the kind illustrated on FIG. 1B, and possibly dedicated input/output blocks on the edges of the logic fabric. However, more complex devices may also comprise different kinds of reconfigurable hardware blocks such as memories and arithmetic units. These blocks might be implemented by suitably interconnected logic blocks of the kind of FIG. 1B, but this would be rather inefficient.
A so-called “bitstream” file, stored in the FPGA configuration memory (which is usually a static RAM, or SRAM), contains all the data required to fully configure the interconnection network and the hardware blocks of the FPGA. In the simplified case of an island-style FPGA (FIG. 1A), wherein all the hardware blocks are logic blocks of the kind illustrated on FIG. 1 B, this comprises the status of all the transistors implementing all the switches of the interconnection network, the 16 bits defining the content of each 4-input 1-output look-up table 4-LUT and the value of the “SEL” bit for each logic block. In reality, raw data are often compressed to reduce the time required to load the bitstream file on the FPGA and the size of the external memory storing the bitstream before its loading into the configuration memory. [Dandalis2001] describes a format- and architecture-independent method of compressing raw bitstreams; U.S. Pat. No. 6,507,943 describes a less general bitstream compression method, based on particular hardware features of the target FPGA.
Several approaches can be used to insert the bitstream file into the FPGA:                Serial insertion, where the configuration bits are inserted one after the other;        Word-parallel insertion, which is still a serial insertion but with N>1 bits inserted at a time;        Word-addressing, where the configuration memory is addressed like a RAM;        Hybrid, wherein part of the bitstream data is first serially inserted into a shift-register memory, and then loaded into specific bits of the configuration memory.        
Bistream files are generated by CAD tools and have proprietary formats, which are defined by the FPGA manufacturers and are not publicly released. Very little information is available regarding these formats, which makes almost impossible to retrieve useful data from a bitstream file in order to interoperate with other CAD tools.
In some applications it is necessary to perform partial reconfiguration of a FPGA, i.e. to reconfigure only a portion of its logic fabric while maintaining the rest of it unchanged.
Document EP 1 303 913 describes a FPGA architecture allowing partial reconfiguration.
Document U.S. Pat. No. 5,946,219 describe a method of performing partial reconfiguration of a FPGA comprising generating a “partial bitstream” describing only the logic blocks and interconnection which have to be modified. The partial bitstream is generated starting from a full bitstream defining the FPGA initial configuration. This method is mostly suitable for small, local changes of the configuration. It does not allow loading several fully independent bitstreams on a same reconfigurable device.
A particular application of partial reconfiguration, and a very demanding one, is reconfigurable computing, wherein some hardware modules (or “tasks”) of a FPGA or another programmable logic device are reconfigured while others keep working (dynamical reconfiguration). Reconfigurable computing requires, inter alia, determining at runtime the position of a particular, predefined hardware task within the logical fabric of the FPGA. This is a difficult problem: indeed, since the bitstream format is unknown, it is not always possible to “relocate” a task by directly altering its partial bitstream. Instead it is necessary:
either to use a CAD tool to generate, on request, a new partial bitstream for a specific location of the hardware task;
or to store in an external module a plurality of partial bitstreams for each hardware task, one for each possible location of the task (see e.g. U.S. Pat. No. 6,678,646).
The first option introduces unacceptable delays, in particular for reconfigurable computing; the second one can lead to very high memory consumption.
Some tools have been developed to allow hardware task relocation by manipulating the associated partial bitstream. They are e.g. “REPLICA” (see [Kalte 2005]), REPLICA 2 PRO (see [Kalte 2006]) and PARBIT (see [Horta 2001]). They act as filters to allow a single source bitstream to be placed at various Partially Reconfigurable Regions (PRR) of the FPGA. However, these tools are limited to a particular FPGA architecture, namely the “Virtex” family of Xlinx. Moreover, relocation is only performed in one dimension, as each PRR occupies an entire column of the logic fabric. Even more importantly, this approach is no longer applicable to modern FPGA, wherein address in the bitstreams are encoded using the AES algorithm.