The present invention relates generally to electronic circuits and more particularly to adder circuits for use in semiconductor integrated circuits and other electronic devices.
As a result of ever-shrinking very large scale integration (VLSI) process geometries, it has become necessary to reexamine the tradeoffs that have been made in the existing design and implementation of computer arithmetic algorithms. Algorithms utilizing the so-called carry lookahead technique, as described in A. Weinberger and J. L. Smith, xe2x80x9cA One-Microsecond Adder Using One-Megacycle Circuitry,xe2x80x9d IRE Trans. on Electronic Computers, pp. 65-73, June 1956, speed up the addition process by unrolling a recursive carry equation. Both transistor count and interconnection complexity have typically limited the maximum unrolling to 4 bits. Larger adders have been built as block carry-lookahead adders, where the lookahead operation occurs within small blocks, as described in T.-F. Ngai et al., xe2x80x9cRegular, Area-Time Efficient Carry-Lookahead Adders,xe2x80x9d Journal of Parallel and Distributed Computing, Vol. 3, pp. 92-105, 1986.
The recursive carry computation can also be reduced to a prefix computation, as described in, e.g., P. M. Kogge and H. S. Stone, xe2x80x9cA Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,xe2x80x9d IEEE Trans. on Computers, Vol. C-22, No. 8, pp. 786-793, August 1973. As described in R. P. Brent and H. T. Kung, xe2x80x9cA Regular Layout for Parallel Adders,xe2x80x9d IEEE Trans. on Computers, Vol. C-31, No. 3, pp. 260-264, March 1982, a prefix tree can be used to compute the carry at the most-significant bit position, and an additional tree superimposed on the prefix tree can be used to compute the intermediate carries. Faster computation of all the carries can be achieved by using a separate prefix tree for each bit position, as described in D. Dozza et al., xe2x80x9cA 3.5 NS, 64 Bit, Carry-Lookahead Adder,xe2x80x9d in Proc. Intl. Symp. Circuits and Systems, pp. 297-300, 1996.
A problem associated with the above-noted full prefix tree adders, which are also known as Kogge-Stone adders, is the additional delay introduced as a result of exponentially growing interconnection complexity. Existing architecture tradeoffs have emphasized reduction of interconnection complexity at the expense of higher gate fanouts. Interconnection complexity can also be reduced by using hybrid carry lookahead/carry select architectures which eliminate the need to implement a full prefix tree for each bit position. The use of low resistance and low capacitance materials can reduce the negative effects of architectures that depend on large amounts of interconnect, as described in J. Silberman et al., xe2x80x9cA 1.0 GHz Single-Issue 64b PowerPC Integer Processor,xe2x80x9d IEEE Intl. Solid-State Circuits Conf., pp. 230-231, February 1998. Furthermore, with additional levels of interconnect, the area overhead required to implement such adders is alleviated through the use of extensive xe2x80x9cover-the-cellxe2x80x9d routing, which removes the routing channels and further minimizes the interconnect capacitance.
The operation of conventional prefix tree adders will be described in greater detail with reference to FIGS. 1 and 2. In a general n-bit prefix tree adder, the addition of two numbers A and B,       A    =                            -                      a                          n              -              1                                      ⁢                  2                      n            -            1                              +                        ∑                      j            =            0                                n            -            2                          ⁢                              a            j                    ⁢                      2            j                                          B      =                                    -                          b                              n                -                1                                              ⁢                      2                          n              -              1                                      +                              ∑                          j              =              0                                      n              -              2                                ⁢                                    b              j                        ⁢                          2              j                                            ,  
represented in two""s complement binary form, can be accomplished by computing:                                                                         g                j                            =                                                a                  j                                ⁢                                  b                  j                                                                                                                        p                j                            =                                                a                  j                                ⊕                                  b                  j                                                                                                                        c                j                            =                                                g                  j                                +                                                      p                    j                                    ⁢                                      c                                          j                      -                      1                                                                                                                                                              s                j                            =                                                p                  j                                ⊕                                  c                                      j                    -                    1                                                                                          }        ⁢          xe2x80x83        ⁢          ∀      j        ,      0    ≤    j     less than     n    ,
where cxe2x88x921 is the primary carry-input. The signals designated gj, pj and cj are referred to herein as generate, propagate and carry signals, respectively; The resulting sum of A and B is   S  =                    -                  s                      n            -            1                              ⁢              2                  n          -          1                      +                  ∑                  j          =          0                          n          -          2                    ⁢                        s          j                ⁢                              2            j                    .                    
An overflow occurs, and the resulting sum is invalid, if
cnxe2x88x921⊕cnxe2x88x922=1.
The above-cited Dozza et al. reference defines (Gjj, Pjj)=(gj, pj), and
(Gij,Pij)=(gj,pj)o(gjxe2x88x921,pjxe2x88x921)o . . . o(gi,pi) if j greater than i,
where o is the fundamental carry operator described in the above-cited Brent and Kung reference and defined as
(gj,pj)o(gi,pi)=((gj+pjgi),pjpi)
The fundamental carry operator o is both associative and idempotent. At each bit position, the carry is given by
cj=G0j+P0jcxe2x88x921xe2x80x83xe2x80x83(1)
where cxe2x88x921 is the primary carry input. If there is no primary carry input, then cj is simply G0j.
FIG. 1 shows a set of superimposed prefix trees 10 for a prefix tree adder in which the computation of (G0j, P0j)∀j can be accomplished in ┌log2n┐ stages. In the set of prefix trees 10, n=16. A complete n-bit prefix tree adder with a set of prefix trees of the type shown in FIG. 1 can be constructed by implementing the following steps.
Step 1 (1 stage):
Calculate gj=ajbj and pj=aj⊕bj∀j 0xe2x89xa6j less than n.
Step 2 (┌log2n┐ stages):
For k=1 . . . ┌log2n┐ calculate
(G0j,P0j)=(Gjxe2x88x922kxe2x88x921+1j, Pjxe2x88x922kxe2x88x921+1j)o(G0jxe2x88x922kxe2x88x921,P0jxe2x88x922kxe2x88x921)∀j 2kxe2x88x921xe2x89xa6j less than 2kxe2x88x921
(Gjxe2x88x922k+1j,Pjxe2x88x922k+1j)=(Gjxe2x88x922kxe2x88x921+1j,Pjxe2x88x922kxe2x88x921+1j)o(Gjxe2x88x922k+1jxe2x88x922kxe2x88x921,Pjxe2x88x922k+1jxe2x88x922kxe2x88x921)∀j 2kxe2x88x921xe2x89xa6j less than n.
Step 3 (1 stage)
Calculate cj=G0j+P0jcxe2x88x921∀j 0xe2x89xa6j less than n.
Step 4 (1 stage)
Calculate sj=pj⊕cjxe2x88x921∀j 0xe2x89xa6j less than n.
In the set of prefix trees 10 of FIG. 1, the open squares at the top compute propagate signals pj and generate signals gj for each bit position in accordance with Step 1, the empty circles apply the fundamental carry operator in accordance with Step 2, and the filled circles represent buffers. The last stage, shown as crossed circles in FIG. 1, applies equation (1) to every (G0j, P0j) in accordance with Step 3. The output of this stage is the carry at each bit position. An additional sum generation stage (not shown) is needed to generate the sum at each bit position from the pj signal and the carry from the previous bit position in accordance with Step 4. A complete 16-bit adder includes the set of prefix trees 10 plus this sum generation stage for implementing Step 4. The logic depth of an n-bit adder of the type illustrated in FIG. 1 is 3+┌log2n┐. If there is no carry input, then the last stage of the set of prefix trees shown in FIG. 1 is not needed.
FIG. 2 illustrates an alternative set of superimposed prefix trees 20, also for the case n=16. Again, it should be noted that a complete adder of this type would include the set of prefix trees as well as a sum generation stage. In the set of prefix trees 20, the contribution due to the carry input is incorporated by redefining the first generate in the set of prefix trees as
g0=a0b0+(a0+b0)cxe2x88x921xe2x80x83xe2x80x83(2)
Such a technique is described in the above-cited Dozza et al. reference. With this change,
G0j=cj∀j.
The polygon in FIG. 2 implements equation (2). This replaces the hardware required to implement Step 3 above and reduces the fanout on the cxe2x88x921 input from n to 1. However, the logic depth remains 3+┌log2n┐ and the overall delay of the adder is unchanged.
An additional speedup in the sets of superimposed prefix trees of FIGS. 1 and 2 can be achieved by using transmit signals tj instead of propagate signals pj to compute carries for each bit position. The final sum computation still requires the propagate signals pj to be generated from the primary inputs. The addition operation in this case is defined as                                                                         g                j                            =                                                a                  j                                ⁢                                  b                  j                                                                                                                        t                j                            =                                                a                  j                                +                                  b                  j                                                                                                                        c                j                            =                                                g                  j                                +                                                      t                    j                                    ⁢                                      c                                          j                      -                      1                                                                                                                                                              s                j                            =                                                (                                                            a                      j                                        ⊕                                          b                      j                                                        )                                ⊕                                  c                                      j                    -                    1                                                                                          }        ⁢          xe2x80x83        ⁢          ∀      j        ,      0    ≤    j     less than     n    ,
where (Gjj, Tjj)=(gj, tj), and
(Gij,Tij)=(gj,tj)o(gjxe2x88x921,tjxe2x88x921)o . . . o(gi,ti) if j greater than i,
where o is the fundamental carry operator. The computation of (G0j, T0j)∀j follows the same methodology as in Step 2 above for (G0j, P0j). The carry cj for each bit position is then given by
cj=G0j+T0jcxe2x88x921
where cxe2x88x921 is the primary carry input. If there is no primary carry input, then cj is simply G0i. The tj signals can be computed faster than the pj signals since an OR gate is typically faster than an XOR gate. Hence, the carry computation through the prefix trees can start slightly earlier if the transmit signals are used. Since the sum generation step still uses the propagate signals, the load on the transmit signals in this architecture is smaller than the load on the propagate signals in the FIGS. 1 and 2 architectures. However, the load on the input signals is now higher since both transmit and propagate signals need to be generated. For example, in the set of prefix trees 10 of FIG. 1, the open squares at the top would now need to compute the transmit signals in addition to the generate and propagate signals. The remaining circles will then operate on the transmit signals instead of the propagate signals.
Although the above-described conventional prefix tree adders can provide acceptable performance in certain applications, further improvements are needed, particularly in terms of parameters such as logic depth, delay and circuit area.
The invention provides a prefix tree adder in which the contribution due to a primary carry input is incorporated into each of at least a subset of prefix trees without introducing any significant additional overall delay. In an illustrative embodiment, the adder includes a set of prefix trees, with a given one of the prefix trees corresponding to each bit position. An n-bit prefix tree adder therefore includes n prefix trees, each associated with a bit position of the adder and including a number of computation stages. The prefix trees are interconnected such that carry signals are computed at least partially in parallel. For example, a carry signal computed in an initial stage of a given prefix tree is used in subsequent stages of the given prefix tree without introducing substantial additional delay in computation of other carry signals in other prefix trees associated with higher bit positions. Carries computed for lower bit positions are thus used to compute carries for higher bit positions, but generate, propagate and/or transmit signals may be generated in an initial stage of each of the prefix trees without utilizing a primary carry input signal in the computation.
The adder architecture of the present invention provides a reduced logic depth, delay and circuit area relative to conventional architectures. The techniques of the invention are applicable to a wide variety of prefix tree adders, including both radix-2 adders and non-radix-2 adders. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.