Arithmetic functions such as adders, subtractors, and magnitude comparators appear in datapath circuits targeted to programmable logic devices (PLDs). The arithmetic functions are typically the critical delay path of a design. As a result, a carry chain can be a vital part of the PLD logic fabric. Optimizing the carry chain can improve performance.
Product-term carry chain architectures have employed a basic ripple-chain structure to propagate the carry term across individual macrocells and logic blocks. In a ripple-carry adder implementation, the worst-case delay is from the carry-in of the least significant bit to the carry-out of the most significant bit. The worst case delay grows linearly with increasing adder width.
Referring to FIG. 1, a product-term based carry chain scheme 10 described in U.S. Pat. No. 6,201,409 is shown. Each segment 12 of the carry chain 10 has inputs that receive two product terms from a product-term array (CPT0, CPT1), a 2:1 carry chain multiplexer 14 with one inverting input and one non-inverting input, and a carry select input 16. The CPT0 and CPT1 inputs are connected directly to two product terms and do not come from the product-term matrix (PTM, not shown). However, the product terms presented to the inputs CPT0 and CPT1 are also inputs to the PTM and can be used to form sum-of-products logic equations. The carry chain multiplexer 14 acts as a single-bit carry generator, selecting one of the two product terms as the carry-in to the particular segment (macrocell) 12. Each segment 12 generates the sum output via an XOR gate 18.
The output of each carry chain multiplexer 14 is propagated as the carry-out to the next macrocell in the chain. The carry-out to the next macrocell is ANDed with a configuration bit, allowing each segment of the carry chain to be decoupled from the next. The single-bit carry generation and propagation is repeated until the carry reaches the last macrocell in the current logic block, at which point the carry-out ripples to the carry-in of the first macrocell in the next logic block.
The carry chain 10 can have a long ripple delay from carry-in to carry-out. The critical path delay increases linearly with the bit width, such that a sizeable arithmetic function can considerably slow down an entire design. For instance, in a current programmable logic device, a 64-bit addition mapped to the carry chain of FIG. 1 can have a worst-case Cin-to-Cout delay of 14.755 ns. For the carry signal to propagate in a single clock cycle, the user's design would have to operate at less than 67 MHz.
Each segment of the carry chain 10 consumes 4 unique product terms per macrocell: 2 carry chain product terms (CPT0, CPT1) and 2 product terms from the PTM to form the partial sum (AB′+A′B). The carry chain scheme 10 necessitates a PLD architecture that allocates at least 4 unique product terms per macrocell.
However, the overall area and delay performance of a high-density PLD can be optimized when the logic clusters are small and allocate only 2 to 3 product terms per macrocell.
Referring to FIG. 2, a reduced product-term carry chain 30 is shown. A description of the carry chain 30 may be found in the co-pending application U.S. Ser. No. 09/587,780, which is hereby incorporated by reference in its entirety. The carry chain 30 has a ripple-chain structure across macrocells and logic blocks similar to the chain 10 of FIG. 1. However, logic is added to each macrocell 32 to generate the sum output directly from the product terms CPT0 and CPT1. Instead of consuming 2 additional product terms from the AND-OR plane to generate a partial sum, the product terms CPT0 and CPT1 are combined by a NOR gate 38 to provide the same partial sum. A 2:1 multiplexer 34 controlled by a configuration bit determines whether the partial sum or the regular sum-of-products equation from the AND-OR plane (OR-in) is driven to the XOR gate 36. The carry chain 30 can be fully implemented in a logic block that allocates as few as 2 product terms per macrocell.
The carry chain 30 can have a long propagation delay associated with the ripple-carry path from block to block. The critical path performance of the carry chain 30 can be similar to that of the carry chain 10. Because the reduced product term scheme 30 introduces an additional NOR gate, multiplexer, and configuration bit to every macrocell in the device, the complexity of the macrocell and configuration architecture is increased. Also, the presence of the multiplexer 34 can increase the propagation delay through the normal sum-of-products data path.