Programmable logic devices (PLDS) are a well-known type of digital integrated circuit that may be programmed by a user to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of configurable logic elements (CLEs) surrounded by a ring of programmable input/output blocks (IOBs). The CLEs and IOBs are interconnected by a programmable interconnect structure. (The programmable interconnect structure between CLEs and IOBs is also referred to as general interconnect). The CLEs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLEs, IOBs, and interconnect structure are configured. The configuration data may be read from memory (e.g., an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
One significant task when implementing a user circuit in an FPGA is the assignment of user logic into the various CLEs and IOBs. This process includes “mapping”, where the user circuit is divided into pieces that will fit into a single CLE, IOB, or a portion thereof, and “placement”, where each mapped piece of logic is assigned to a particular CLE or IOB (or portion thereof) in a particular location on the FPGA. The final step in implementing the circuit is called “routing”, where the mapped and placed logic is connected together using the programmable interconnect structure. The mapping, placement, and routing processes are typically performed by computer software, which reads in a description of the user circuit (for example, in the form of a netlist) and provides the bitstream that is used to program the device, as described above.
In practice, each CLE is typically formed from several smaller logic blocks, such as 4-input lookup tables (LUTs). Because each block has a fixed size, and the size of the block is usually fairly small to facilitate the efficient implementation of small logic functions, the implementation of larger user circuits requires the use of several logic blocks. Sometimes these logic blocks can be accommodated within a single CLE, in which case the general interconnect need not be used to connect the blocks. In other cases, the required number of logic blocks is too large for a single CLE. The necessary logic blocks must then be connected using the general interconnect, which is typically slower than connections within a single CLE. Thus, user circuits up to a certain size (i.e., the size that will fit in a single CLE) are typically faster than user circuits of a larger size. Further, user circuits that fit into a single logic block (e.g., a single 4-input LUT) result in the fastest implementations.
Therefore, it is desirable to provide structures and methods for combining two or more logic blocks in such a way as to permit user circuits too large for a single logic block to function at more nearly the same operating speed as user circuits within a single logic block.