The present invention relates to a method and/or architecture for programmable logic device (PLD) logic blocks generally and, more particularly, to a method and/or architecture implementing multiplexers for efficient PLD logic blocks.
Referring to FIG. 1, a block diagram illustrating a product term based complex programmable logic device (CPLD) 10 is shown. The CPLD 10 has a programmable interconnect matrix (PIM) 12 and a number of logic blocks 14. The logic blocks 14 include an AND-array 16, an OR-array 18, and a macrocell 20.
Referring to FIG. 2, a diagram of the AND-array 16 is shown. The AND-array 16 receives a number of input terms (ITs) M1 and generates a number of product terms (PTs) M2. The size of the AND-array 16 is determined by the number of ITs and PTs. Reducing the size of the AND-array 16 in both the input and output directions improves performance and cost.
Product term based CPLD architectures can consume more area and achieve lower speed performance compared to LUT-based FPGA architectures. The primary reason for the difference is the large size of the AND-array, in terms of both: (a) the number of product terms (PTs) generated per macrocell; and (b) the number of input terms (ITs) to the AND-array.
Referring to FIG. 3, a three-dimensional bar graph 30 comparing the Area*Delay2 product versus the number of inputs (ITs) and product terms (PTs) for a 6-macrocell logic block is shown. With respect to the number of product terms generated per macrocell, the area and delay performance are optimized when the number of product terms per macrocell is set to as few as 2 (e.g., Bar 32). However, two product terms per macrocell is lower than the typical value of 4 or 5 product terms per macrocell used in current industry-standard CPLDs. A disadvantage of the 2-PT/macrocell architecture is that several basic state machines require slightly more than two product terms per state bit. Such state machines include various classes of counters and shift registers that are commonly used as building blocks in sequential circuits. The counters and shift registers commonly must be implementable in a single logic block of a CPLD. A comparison of the output equation, number of product terms per macrocell, and number of AND-array inputs associated with each type of counter is shown in the following TABLE 1:
A similar comparison for various classes of shift registers is shown in the following TABLE 2:
To accommodate the product term requirements of the state machines of TABLES 1 and 2, the traditional solution has been to use a large AND-array, typically having 4 to 5 unique product terms per macrocell. However, as shown in FIG. 3, having 4 to 5 unique product terms per macrocell can result in large logic area and poor overall speed performance of the CPLD.
With respect to the number of AND-array inputs, conventional logic block architectures require a high number of AND-array inputs in order to simultaneously route (i) sufficient input terms for combinatorial functions, and (ii) sufficient input/load data terms and macrocell feedbacks for sequential functions. TABLES 1 and 2 show that at least (2n+2) unique AND-array inputs must be present in an n-macrocell logic block to implement various counters and shift registers. Conventional architectures actually employ more than (2n+2) AND-array inputs (typically 2.25n to 3n) to guarantee full routability into the logic block.
There are several existing CPLD logic block architectures, all of which fall into two categories: (a) a sum-of-products logic block architecture that has a large AND-array, where input terms and macrocell feedbacks are routed via a global interconnect matrix; and (b) a sum-of-products logic block architecture that has a large AND-array, where input terms are routed via a global interconnect matrix and macrocell outputs are fed back locally or globally.
Referring to FIG. 4, a diagram of a conventional sum-of-products logic block architecture 40 is shown. The sum-of-products logic block architecture 40 has a large product term array 42, where input terms and macrocell feedbacks are routed via a global interconnect matrix. The logic block 40 receives 36 inputs from the programmable interconnect matrix (PIM), with each signal delivered in both true and complement form to the product term array 42 (totaling 72 AND-array inputs). The product term array 42 generates 80 general-purpose product terms that are allocated across sixteen macrocells 44, resulting in an average of five PTs/macrocell. There are an additional seven control product terms allocated for reset, preset, product term clock, and output enable signals. To implement a synchronous load for a state machine, the LOAD signal and data lines are routed through the PIM, the product term array 42, and the product term allocator 46 to form the desired sum-of-products expression for each state machine bit. In addition, macrocell feedbacks can only route back into the logic block via the PIM. The macrocell feedbacks form a subset of the 36 inputs to the logic block.
Referring to FIG. 5, a diagram illustrating another conventional sum-of-products logic block architecture 50 is shown. The sum-of-products logic block architecture 50 has a large AND-array, where input terms are routed via a global interconnect matrix and macrocell outputs are fed back locally or globally. The logic block 50 receives thirty-three input terms from a global routing pool, sixteen expander product terms from the local array, and sixteen local macrocell feedbacks. The input terms and macrocell feedbacks are delivered in both true and complement form to the product term array (totaling 114 AND-array inputs). The AND-array architecture of the logic block 50 is slightly different from the logic block 40 due to the presence of parallel expanders 52 and shareable expanders 54. However, the use of the parallel and shared expanders still results in an average of five product terms allocated per macrocell. As is the case with the logic block 40 architecture of FIG. 4, a synchronous load function requires the load control and data signals to propagate through the global routing pool, the product term array, and the product term select matrix. However, unlike the logic block 40, each macrocell output has a dedicated feedback path to both the global routing pool and the local AND-array.
The primary disadvantage of conventional CPLD logic block architectures 40 and 50 is that the large size of the AND-array, in both the input and output directions, results in high area consumption and poor speed performance of the overall CPLD. None of the existing architectures have a specialized datapath to implement synchronous load functions for a state machine, that would reduce the size requirement of the AND-array. For example, implementing a 16-bit loadable counter or shift register using a single logic block 40 can require as many as sixty-four unique product terms and thirty-four unique PIM inputs (e.g., see TABLES 1 and 2). The AND-array 42 is forced to be large in both the input and output directions (72 inputs, 80 product terms) to subtend the state machine implementations. As evidenced by FIG. 3, the 5-PT/macrocell architecture can result in poor overall area/delay performance of the CPLD. Furthermore, the full routability of state machine outputs into the same logic block is not guaranteed, since the outputs must first route through a large interconnect matrix which can deliver only a fixed number of inputs to a logic block. Maximizing the routability of macrocell feedbacks into the logic block also forces the AND-array 42 to be large in the input direction. Another disadvantage of the logic block 40 is that the macrocell feedbacks incur a long propagation delay through the PIM. The long propagation delay can degrade the maximum operating frequency of the state machine.
When the 16-bit loadable counter or shift register is implemented using a single logic block 50 of FIG. 5, the same product term and input requirements listed in TABLES 1 and 2 apply, with the exception that local macrocell feedback paths can be used to route the state machine outputs. The sixteen local feedbacks guarantee full routability of state machine outputs back into the logic block, and eliminate any propagation delay otherwise incurred through the global routing pool. However, because the local feedbacks are dedicated, and because there are sixteen additional expander product terms present as inputs, the AND-array of the logic block architecture 50 is even larger and more inefficient in the input direction (114 inputs) as compared to the logic block architecture 40 (72 inputs). The 5-PT/macrocell configuration also forces the AND-array of the logic block architecture 50 to be large in the output direction, leading to the same pitfalls in CPLD area and delay performance.
It would be desirable to have a logic block architecture that reduces the number of AND-array inputs below (2n+2)-making the overall architecture faster and more area-efficient-while maintaining the capacity and physical routability for implementing the basic state machines listed in Tables 1 and 2.
The present invention concerns a logic section of a programmable logic device comprising a first circuit and a second circuit. The first circuit may be configured to (i) implement user defined programmable logic and (ii) generate an output in response to a first input and a second input. The second circuit may be configured to generate the second input. in response to the output, a third input, and a fourth input.
The objects, features and advantages of the present invention include providing multiplexers for efficient PLD logic blocks that may (i) provide a programmable logic device (PLD) logic block architecture capable of faster, more routable, and more area-efficient implementations of counters, shift registers, and other state machines as compared to conventional sum-of-products PLD architectures, (ii) perform input-term multiplexing at macrocell inputs, (iii) allow loadable unidirectional or bidirectional counters and shift registers to be built without consuming additional product terms in generic logic, (iv) multiplex macrocell feedbacks directly into the product term array, (v) guarantee routability of state machine outputs, (vi) provide fast feedback paths within the logic block, and (vii) significantly reduce the AND-array size in a sum-of-products architecture, resulting in smaller die area and faster operation compared to existing PLD architectures.