FPGAs are general-purpose programmable devices that are customized by the end users. FPGAs include an array of configurable logic blocks (CLBs) that are programmably interconnected. As shown in FIG. 1, the basic device architecture of an FPGA 100 comprises an array of CLBs embedded in a configurable interconnect structure 103 and surrounded by configurable I/O blocks (IOBs). A CLB and its associated interconnect structure form a tile 108 that is repeated in rows and columns across the FPGA. The configurable interconnect structure 103 allows users to implement multi-level logic designs in which the output signal of one CLB provides input to another CLB, the output of that CLB provides input to another CLB, and so forth. An IOB allows signals to be optionally driven off-chip or brought into the FPGA. The IOB can typically also perform other functions, such as tri-stating outputs and registering incoming or out-going signals.
An FPGA can support tens of thousands of gates of logic operating at system speeds of tens of megahertz. The FPGA is programmed by loading programming data into memory cells (not shown in FIG. 1) controlling the CLBs, IOBs, and interconnect structure 103. One type of FPGA is the XC4000.TM. family of devices from Xilinx, Inc. Further information about the XC4000 family of FPGAs appears on pages 4-5 to 4-69 of "The Programmable Logic Data Book 1998", published in 1998 and available from Xilinx, Inc. at 2100 Logic Drive, San Jose, Calf. 95124, which pages are incorporated herein by reference. (Xilinx, Inc., owner of the copyright, has no objection to copying these and other pages referenced herein but otherwise reserves all copyright rights whatsoever.)
Each CLB in the FPGA can include configuration memory cells (not shown in FIG. 1) for controlling the functions performed by that CLB. For example, a typical CLB may include several programmable lookup tables, multiplexers, and memory elements. A lookup table stores a truth table that implements the combinational logic function corresponding to the truth table. The multiplexer is a special-case one-directional routing structure that is controlled by one or more configuration memory cells. The memory elements may, for example, be programmable as flip-flops or latches. The configuration memory cells control the functionality of each of these elements and the interconnections between these elements within the CLB.
Interconnect structure 103 includes programmable interconnect points (PIPs, not shown in FIG. 1) that control the interconnection of wiring segments in the programmable interconnect network of FPGA 100. Each PIP may, for example, be a pass transistor controlled by a configuration memory cell. Wire segments on each side of the pass transistor are either connected or not connected together, depending on whether the transistor is turned on by the corresponding configuration memory cell.
Configuration is the process of loading a bitstream (configuration data file) containing the program data into the configuration memory cells that control the CLBs, IOBs, and interconnect structure of the FPGA. (Other structures in the FPGA may also be configured by the bitstream, e.g., global clock buffers and phase-locked loops.) The bitstream is typically stored in an external memory device 106 such as programmable read-only memory (PROM). The bitstream is loaded into the FPGA through a configuration loading circuit 104. (For clarity, configuration loading circuit 104 is shown in FIG. 1 as external to the CLB and IOB array 102. However, loading circuit 104 may be implemented as a block of logic located within array 102, or may be distributed throughout array 102.) The bitstream is often loaded into the FPGA serially to minimize the number of pins required for configuration and to reduce the complexity of the interface to external memory. The bitstream is broken into packets of data called frames. As each frame is received, it is shifted through a frame register until the frame register is filled. The data in the frame register of the FPGA are then loaded in parallel into one row or column of configuration memory cells. Following the loading of the first frame, subsequent frames of bitstream data are shifted into the FPGA, and another row or column of configuration memory cells is designated to be loaded with a frame of bitstream data. One configuration circuit is described in detail by Hung et al in U.S. Pat. No. 5,430,687, issued Jul. 4, 1995, entitled "Programmable Logic Device Including a Parallel Input Device for Loading Memory Cells", which is incorporated herein by reference.
The step of loading the bitstream into the FPGA limits the speed of configuration. This "bitstream bottleneck" has become increasingly apparent as the number of configuration bits has increased over the past several years from thousands to tens and hundreds of thousands, even millions, of bits. This dramatic increase in bitstream size has resulted in a corresponding increase in the time required for configuration and reconfiguration. Therefore, what is needed is a system and method for efficiently loading bitstream data into an FPGA for rapid configuration and reconfiguration of the configuration memory cells.
Very large bitstreams can cause other problems, as well. For example, PROMs are limited in their storage capacity. To store a single bitstream for the largest FPGAs available today, several PROMs may be required, increasing system costs and using excessive space on the printed circuit board on which the PROMs are typically mounted. This problem is exacerbated when several different bitstreams are provided for the FPGA. Therefore, it is desirable to provide a system and method for reducing the amount of space needed to store an FPGA bitstream.
Cliff et al describe one such system and method in U.S. Pat. No. 5,563,592, issued Oct. 8, 1996 and entitled "Programmable Logic Device Having a Compressed Configuration File and Associated Decompression", which is incorporated herein by reference. Applying the method described by Cliff et al to FPGA 100 of FIG. 1 results in the system shown in FIG. 2. First, the bitstream is compressed on a computer (not shown). The compressed bitstream is then loaded into external memory device 206 via memory input bus 212. (In other systems described by Cliff et al, the bitstream is compressed via a compression circuit included in memory device 206.) Memory device 206 includes a memory array 210, in which the compressed bitstream is stored, and a decompression circuit 208. The bitstream is decompressed by decompression circuit 208, and the decompressed bitstream is loaded into FPGA 100.
In another system also described by Cliff et al, the decompression circuit is included in the FPGA. This modification allows the final step of transmitting the bitstream to the FPGA to be performed on a decompressed bitstream, thereby reducing the amount of time necessary for the transmittal. This system is applied to the FPGA of FIG. 1 as shown in FIG. 3. In FIG. 3, FPGA 300 comprises CLB and IOB array 102, loading circuit 104, and decompression circuit 208.
Although overcoming to some extent the problems previously described, Cliff et al state that their system and method permits "a ratio of as much as 2 to 1", i.e., the compressed bitstream is at least half the size of the original bitstream. It is desirable to provide a system and method for compressing and decompressing FPGA configuration data that gives a higher compression ratio, with the corresponding benefits of reduced storage requirements and faster transfer of configuration data.