1. Field of the Invention
The present invention relates to the field of programmable integrated circuit devices, and specifically to loading configuration information into a field programmable gate array.
2. Background Technology
Programmable integrated circuits, such as field programmable gate arrays (FPGAs), are programmed to perform a particular task by loading configuration information into the FPGA. The configuration information can be viewed as strings of binary bits. The configuration information is loaded into the FPGA during programming cycles which are performed before the FPGA is used for its intended function. The configuration information is used to initialize the configurable logic circuits (CLCs) of the FPGA and also to program the programmable interconnect structures of the FPGA to provide the required connections between CLCs. The configuration information can be stored in a nonvolatile memory (e.g., ROM) and loaded into the FPGA upon device power-up, or, the configuration information can be permanently programmed (e.g., one-time only) into an FPGA having antifuse material. In either instance, the configuration information requires loading into the FPGA.
The configuration information is loaded into the FPGA in data frames ("frames"). The frame length indicates the number of bits for a specific embodiment. Note that this number varies from one embodiment to another. Consecutive frames are used to load the configuration information into a particular FPGA during initialization. During initialization in prior art designs, a frame of data is serially loaded into a configuration register of an FPGA using successive programming cycles. Once in the configuration register, the bits of the frame (e.g., a data word) are stored into locations of a memory unit using conventional memory access cycles which store the entire data word. These programming cycles repeat until all of the data frames are loaded into the FPGA. Typical programming frequencies can run 4 MHz or more.
FIG. 1A illustrates one prior art mechanism for loading configuration information within a prior art FPGA 100. Bits of configuration information data frames are serially loaded, bit by bit, over a serial input port 30 into a receiving shift register 15. The receiving shift register 15 is composed of N serially coupled one-bit shift registers, 15(1 to n), each clocked by the same clock signal. The receiving shift register 15 contains a one-bit shift register for each of the N bits of the frame of configuration information. As the configuration information is serially bit shifted into the serial input port 30, the bits of the receiving shift register 15 are bit shifted downstream in synchronization.
The first bit of each configuration data frame contains a frame start flag, thereby when the frame start flag reaches shift register 15(n), the last one-bit shift register, the data frame has been completely loaded into the receiving shift register 15. A data frame having N bits is completely loaded in N programming cycles using the mechanism of FIG. 1A. There are N number of data lines 20(1 to n) and each shift register 15(1 to n) is coupled to an associated data line. Each data line 20(1 to n) is also coupled to a memory unit 10. During a write cycle, each bit stored in the receiving shift register 15 is simultaneously loaded as a data word into the memory unit 10 via data lines 20 (1 to n).
FIG. 1B illustrates another prior art mechanism for loading configuration information into an FPGA 100'. In this mechanism, the configuration register is comprised of a number of blocks (e.g., 35, 36, . . . , and 37) and each block is comprised of eight 8-bit shift registers (a-h). With a data frame having N bits, there are N/8 blocks and N 8-bit shift registers required. Configuration information is loaded 8 bits per programming cycle over bus 32. Each of the 8-bit shift registers of a block receives a single configuration bit in a unique bit location for the block. For example, block register 35a receives its bit in position 1, 35b in position 2, 35c in position 3, etc., and block register 35h receives its bit in position 8.
The 8 bits within block 35 are simultaneously loaded into block 35 from bus 32 during a programming cycle. During each programming cycle (e.g., clock cycle), the previous bits of each block are loaded into its downstream block. For example, after block 35 receives the 8 bits, upon the next programming cycle, these bits are shifted into the downstream block 36. The other downstream blocks are analogously configured and operate according to block 35. Each 8-bit register in these blocks (e.g., 36, . . . , 37) contains a corresponding bit position for receiving the shifted bits from their upstream neighbor. For instance, register 36a receives register 35a's bit in position 1, register 36b receives register 35b's bit in position 2, etc., and register 36h receives register 35h's bit in position 8. This staggered bit position approach is implemented so that the data lines 20(1 to n) are aligned parallel to each other. In the implementation of FIG. 1B, the input configuration data is not parallel loaded into consecutive positions of a single shift register.
Each 8-bit register (a-h) of each block (35 . . . 37) of FIG. 1B is coupled to a data line (e.g., lines 20(1 to n)) so there is a separate data line for each of the N bits of a frame. When the receiving register blocks (35-37) are full, the data frame information is loaded into the memory unit 10 in accordance with the system of FIG. 1A.
It is appreciated that the mechanism of FIG. 1B can also be implemented where each block (e.g., 35 to 37) contains one horizontally positioned 8-bit register which receives all eight bits per programming cycle, however, this implementation requires staggering of the data lines 20(1 to n). It is desired not to stagger the data lines 20(1 to n) in order to facilitate layout and manufacturing of the resultant circuit.
The system of FIG. 1B provides an eight fold increase in configuration data transfer rate over the system of FIG. 1A since eight bits can be transferred through the blocks (35 . . . 37) per programming cycle. However, since the architecture of the system of FIG. 1B does not store the configuration bits serially in a contiguous bit stream, it is not downward compatible with the system of FIG. 1A. Moreover, there is no straight forward circuit modification available for the system of FIG. 1B that would render the circuit downwardly compatible with the system of FIG. 1A without severely enlarging the circuit area.
Accordingly, what is needed is a configuration information transfer mechanism allowing parallel loading of configuration data bits so that high speed data transfer rates are obtained. The present invention provides this advantage. What is needed further is a high speed configuration information transfer mechanism that is downward compatible with a system employing the mechanism of FIG. 1A. The present invention also provides this advantage. What is also needed is a system allowing programmable width parallel transfer. The present invention provides this additional feature.