1. Field of the Invention
This invention relates generally to complex programmable logic devices and in particular to very high-density programmable logic devices with a high speed, hierarchical, scalable, multi-tiered switch matrix structure, optimized and flexible logic allocation, and a novel logic macrocell.
2. Description of Related Art
Architecture plays a crucial role for high-density, high performance programmable logic devices (PLDs). Over the past few years, essentially two types of PLD architectures have emerged in the high-density programmable logic device arena, i.e., channeled array based field programmable gate arrays (FPGAs) and segmented block-based electrically erasable PLDs, sometimes known as complex PLDs (CPLDs).
Conceptually, both segmented block-based CPLDs and FPGAs are programmable blocks with programmable interconnects. The fundamental difference between the two is in the architecture of the basic programmable block and the programmable interconnect structure. Basic goals are same for both architectures--higher density, higher performance, broader application base, low cost, and easy to use solutions. However, the means used to achieve these goals and results are different. Each of the basic architectures has its own characteristics, strengths and weaknesses that emanates from the fundamental philosophy behind the architecture.
Channeled array based FPGA architectures are characterized as an array of a large number of narrow, granular blocks, usually look-up table based or multiplexer based logic, surrounded by a phalanx of uncommitted I/O blocks, and interconnected by distributed programmable interconnect structures. Segmented block-based CPLDs, on the other hand, are characterized as made up of a smaller number of large programmable logic blocks, that are usually sum-of-products based, interconnected by a centralized switch matrix or interconnect array.
Narrow granular blocks along with distributed interconnects inherently tend to have variable, non-uniform, path-dependent and somewhat unpredictable signal delays. Most channeled array based FPGAs typically exhibit such signal delay characteristics. Most segmented block-based CPLDs tend to have fixed, more predictable, uniform and path-independent signal delays.
There are two distinct interconnect approaches for segmented block-based CPLDs, i.e., (i) a full cross-point programmable interconnect array (PIA) and (ii) a multiplexer based, sparse switch matrix structure. The full cross-point programmable interconnect array (PIA) has the potential advantage of 100% global connectivity for all signals. All global signals are typically brought into a centralized interconnect array and the input signals for each programmable logic block in the CPLD are generated from this centralized array. Though the number of input signals for each block is a subset of the total number of global signals, each input signal can essentially be a function of all global signal sources at the same time. This global connectivity provides 100% connectivity and somewhat simplifies the "Routing" software. Also, since full global connectivity is always available, the routing software is not required to make any particularly intelligent decisions for routing signals.
The major disadvantages of the full cross-point PIA approach are speed degradation, a larger die size, scalability with increased density, and wasted resources. A high-density CPLD incorporating a full cross-point programmable interconnect array tends to be (and is likely to be) slower and more expensive than a comparable CPLD with an optimized sparsely populated switch matrix. Since the full cross-point PIA receives feedback signals from all the internal logic macrocells of all logic blocks and all I/O pin feedback signals, the total number of input signals to the PIA is directly proportional to the total number and size of the logic blocks, and the number of I/O pins in the CPLD. As the number of logic blocks and I/O pins increase, the total number of signals routed to the PIA tends to increase rapidly.
As the density is increased to larger pin-count and higher-density logic, the PIA overhead becomes quite significant. Very large programmable interconnect arrays are inherently "slow" and have the additional overhead of larger die area. While conceptually the same approach can be used for larger density devices, in reality the approach becomes very difficult to implement. Therefore, scalability to higher densities with the full cross-point PIA is questionable. Considering these limitations, it is not surprising that the PIA has been utilized primarily in smaller density CPLDs.
The other significant drawback of the PIA is its potential waste of significant resources. The number of input signals needed for a block is usually significantly smaller than the device's full capability. As a result, most of the signal paths remain unutilized.
While the PIA approach strives for flexibility, the multiplexer based sparse switch matrix approach focuses on speed, cost, optimized global connectivity, and die size. Like a PIA, the sparse central switch matrix receives input signals from all macrocells and I/O pins. All programmable logic block input signals, in turn, are derived from the sparse central switch matrix. However, each programmable logic block input line is judiciously connected to a subset of the total input lines of the sparse central switch matrix to provide optimized connectivity. Unlike the PIA approach, each programmable logic block input line is not connected to all input lines of the sparse central switch matrix. Rather, all the input lines of a programmable logic block combined have access to all signals on the input lines to the sparse central switch matrix.
The sparse central switch matrix approach tends to put some restrictions on the global connectivity. This approach provides particular combinations of input signals to a programmable logic block, and so restricts certain combinations of input signals. However, with an intelligently structured sparse central switch matrix, the global connectivity can be significantly enhanced and the routing restrictions can be significantly minimized.
The major benefits of the sparse central switch matrix approach are high speed and predictable delays. Since all signals go through the sparse central switch matrix, the signal time delays are typically fixed, uniform, predictable, and path-independent.
Two of the most critical parameters affecting the routability of a multiplexer based sparse central switch matrix are the number of input lines to each programmable logic block from the switch matrix and the multiplexer size. As the multiplexer size and the number of programmable logic block input lines increase, signal routability of the total device increases. Unfortunately, both a larger number of input lines and a larger multiplexer structure result in slower performance and a bigger die size. Hence, there is a significant challenge for intelligently structuring both the block input size and the multiplexer size.
A major potential drawback of the single-tiered sparse central switch matrix approach is scalability to higher density devices. The number of signals entering the sparse central switch matrix tends to increase linearly with the number of macrocells and I/O pins. Thus, as the density is increased, the capacitive load for driving all signals to a central switch matrix is significantly increased. Also, to provide decent routability, with the goal of providing optimized global connectivity, either the number of programmable logic block input lines or the multiplexer size needs to be increased rapidly, resulting in slower and potentially more expensive devices. As a result, higher-density, block-based CPLDs tend to be comparatively slower than lower-density block-based CPLDs.
In a prior art CPLD with 128 macrocells in eight programmable logic blocks, twenty-six programmable logic block input lines, and a 16:1 multiplexer based switch matrix were adequate to achieve optimized routability. Twenty-six input lines and a 16:1 multiplexer provide the ability to select the twenty-six programmable logic block input signals from a maximum of 416 different signals. For a device with 128 macrocell feedback signals and 64 I/O pin feedback signals, i.e., a total of 192 global feedback signals, this provides a maximum of 2.16 ways (416/192) of routing each signal.
In another prior art CPLD, the centralized switch matrix approach was extended to 256 macrocells with 128 I/Os in 16 programmable logic blocks. Each programmable logic block had 34 input lines and a 36:1 multiplexer based switch matrix for each block. This provided the ability to route a maximum of 1224 (34.times.36) signals. With 256 macrocells, 128 I/O pins, and 14 dedicated input pin feedbacks, i.e., a total 398 global signals, this provided slightly more than three ways (1224/398) for routing each signal. Routability in this device was significantly more than the other prior art devices. A drawback of the larger switch matrix structure is obviously slower speed and increased die size.
While simplicity, speed, and smaller die size are the major advantages of the multiplexer based switch matrix structure, the major limitations are limited programmable logic block input lines and multiplexer size. As the pin-count or the logic density is increased, providing optimal global connectivity with only a centralized switch matrix is difficult because the multiplexer size becomes increasingly larger to maintain routability, which in turn degrades speed for all signals and complicates layout of the silicon die. Consequently, migration of logic designs to higher density CPLDs is possible only if the penalties in speed and die size are acceptable. An interconnect architecture for very high-density CPLDs with good speed, good signal routability characteristics, and uniform, predictable, fixed and path independent time delays is currently unavailable.