Programmable logic devices (PLDs) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAMs), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and the like.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and the like.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
FPGAs typically include arithmetic logic for performing various arithmetic operations (e.g., addition, subtraction, multiplication, and the like). In many high-speed applications, more than two data words must be added together in one clock cycle. For example, summing three data words is a common operation in digital signal processing (DSP) and networking applications. Currently, FPGAs exist that include “carry chain logic” optimized for adding two data words. With such FPGAs, two separate carry chains must be used to implement a three word adder.
In particular, FIG. 1 is a block diagram depicting a conventional three-word adder 100 implemented using two separate carry chains 102 and 104. In the present example, the three-word adder 100 sums 4-bit data words X, Y, and Z. As used herein, for a given data word M, the notation M[n] refers to the nth bit of M. The carry chain 102 comprises adders 1061 through 1064 (collectively referred to as adders 106), and the carry chain 104 comprises adders 1081 through 1084 (collectively referred to as adders 108). The adder 1061 computes the sum of the bits X[1] and Y[1] and a carry value. For each adder 106k of the adders 1061 through 1064, the adder 106k computes: (i) the sum of bits X[k], Y[k], and the carry value produced by the adder 106k-1; and (ii) a carry value. The adder 1081 computes the sum of the output of the adder 1061 and the bit Z[1]. For each adder 108k of the adders 1082 through 1084, the adder 108k computes: (i) the sum of the output of the adder 106k, the bit Z[k], and a carry value produced by the adder 108k-1; and (ii) a carry value.
In the conventional three-word adder 100, the critical path (for example, bold dashed line path for Y[1] in FIG. 1) for each bit traverses two add operations (e.g., the adder 1061 and the adder 1081) prior to entering the carry chain of the final adder (e.g., the carry C[1] produced by the adder 1081). In other words, for each bit, the critical data path traverses the carry chain of two adders. For each bit, the delay through the two adders combined with the routing delay between the two adders comprises a significant portion of the total adder delay and creates a significant performance penalty.
A “carry-save adder” is known that reduces the delays associated with the conventional three-word adder 100. The carry-save adder separates the three word add operation into a two word add operation. Notably, the carry-save adder pre-computes the sum and carry values of the three words, and then adds the sum and carry bits together using a two-word add operation.
FIG. 2 is a block diagram depicting an nth stage 200 of a conventional carry-save adder for the kth bit location of three input words X, Y, and Z. The kth stage 200 includes an exclusive-OR (XOR) gate 202, AND gates 204, 206, and 208, an OR gate 214, an XOR gate 210, and an AND gate 212. The XOR gate 202 produces a pre-computed sum bit from the bits X[k], Y[k], and Z[k]. The AND gates 204, 206, and 208 in combination with the OR gate 214 produce a pre-computed carry bit from the bits X[k−1], Y[k−1], and Z[k−1] (i.e., the carry value from the k−1st bit location). The XOR gate 210 and the AND gate 212 comprise a full adder 209. The AND gate 212 produces a carry value for the kth bit location, C[k], from the pre-computed sum bit and the pre-computed carry bit. The XOR gate 210 produces a sum value, S[k], from the pre-computed carry bit, the pre-computed sum bit, and a carry value, C[k−1] from the k−1st stage.
A conventional carry-save adder may be formed by replicating the stage 200 for any number of bit locations. Due to the additional logic required to pre-compute the sum and carry values, the carry-save adder requires more logic resources to implement than the traditional dual carry-chain solution of FIG. 1. In an FPGA, this additional logic not only reduces the device resources available for other logic, but also reduces the performance gain of the carry-save adder by increasing routing congestion and delay between elements.
Accordingly, there exists a need in the art for an improved digital logic circuit for adding three binary words.