Addition of two binary numbers is a fundamental operation used in many electronic circuits. For example, binary addition is used in integer arithmetic-logic units, and also, all the floating-point operations use integer addition in their calculations. Memory accesses require integer addition for address generation, branches use addition for forming instruction addresses, and for making greater-than or less-than comparisons. Thus, many modern circuits contain several integer adders, many of which may appear on frequency-limiting paths.
In an addition of two numbers, the digit in each column of the first number is added to the digit in the corresponding column of the second number, and any carry digit resulting from the previous column is also added, in order to obtain the value of the sum in each column. Thus, for two n-bit binary numbers a=an−1 . . . a1a0 and b=bn−1 . . . b1b0, their sum is the n+1 bit number given by s=sn . . . s1s0, where:sn=cn si=ai⊕bi⊕ci ci+1=aibi+ci(ai+bi)where ck is the carry into position k, + denotes logical OR, proximity denotes Logical AND and ⊕ denotes Exclusive OR.
The carry bit into any chosen column can be generated from two logical functions called Generate and Propagate. The bit level Generate function gi indicates whether a carry is generated by a particular column in the addition. The function gi is true if a carry is generated at column i. The bit-level propagate function pi indicates whether any carry for a particular column will be propagated on to the next column. The function pi is true if carry into column i is propagated into column i+1. The bit level generate and propagate functions can be constructed from the bits in column i of the two numbers to be added, as follows:gi=aibi pi=ai+bi 
Thus, in the addition of a=an−1 . . . a1a0 and b=bn−1 . . . b1b0, the carry into the j+1'th column is given by:Gj:0=cj+1=gj+pjgj−1+pjpj−1gj−2+. . . +pjpj−1 . . . p1g0 
FIG. 1 shows an implementation of a circuit to generate Gj:0 based on the above equation. However, the circuit of FIG. 1 is not a practical circuit to realize for large values of j. It is an OR of j+1 AND-terms, the largest of the AND gates also having j+1 inputs. Moreover, the fan-out of the p's is very large, pj having a fan-out of j.
High speed practical implementations realize the carry function in a tree like structure. A prior art method is known as parallel prefix and will now be illustrated (S Knowles, “A Family of Adders”, Proc, 14th IEEE Symp. On Computer Arithmetic, pp 30-44, 1999). The parallel prefix method uses bit-level generate and propagate functions to construct Group Generate and Group Propagate functions.Gj:k=gj+pjgj−1+pjpj−1gj−2+. . . +pjpj−1. . . pk+1gk Pj:k=pjpj−1 . . . pk+1pk 
The function Gj:k is true if the group of bits from k to j generates a carry and the function Pj:k is true if the group of bits from k to j propagates a carry coming into that group into the next group.
The parallel prefix method uses Group Generate and Group Propagate functions of smaller sized groups to construct the Group Generate and Group Propagate functions of a larger group. A large group of bits from i to j is divided into 2 groups say from i to k−1 and k to j. The larger group generates a carry if either the most significant group generates a carry or the least significant group generates a carry and the most significant group propagates this carry. This is illustrated in FIG. 2. In logical notation this can be expressed as.Gj:0=Gj:k+Pj:kGk−1:0 
The Group Propagate function of a large group can be constructed from Group Propagate functions of smaller groups:Pj:i=Pj:kPk−1:i 
These two constructions allow the Group Generate of a larger group to be formed recursively from smaller groups, which themselves are formed from even smaller groups and so on.
This method allows for the construction of Gj:i in ┌ log2(j−i)┐ levels, once the bit-level generate and propagate functions have been formed.
It is possible to form the Group Generate of a large group in fewer levels still. If the large group i to j is divided into 3 groups say, i to k′−1, k′ to k″−1, and k″ to j then:Gj:i=Gj:k″+Pj:k″Gk″−1:k′+Pj:k″Pk″−1:k′Gk′−1:i 
The drawback of this method is that although fewer combining levels are needed, the gates at each combining level are more complex and the fan-out on the Group Generate and Group Propagate functions increases. Both of these impact heavily on the delay of the circuit. This situation is further exasperated when all the carries for an adder need to be constructed.
The following is an example of the parallel prefix method for a 9-bit addition, using base 3. A circuit diagram for this example is shown in FIG. 3.
Given two 9-bit numbers a=a8a7 . . . a1a0 and b=b8b7 . . . b1b0, we form 3-bit groups a8a7a6, a5a4a3, a2a1a0 for a and b8b7b6, b5b4b3, b2b1b0 for b.
Then the generate and propagate functions for each group areG8:6=g8+p8g7+p8p7g6, P8:6=p8p7p6 G5:3=g5+p5g4+p5p4g3, P5:3=p5p4p3 G2:0=g2+p2g1+p2p1g0, P2:0=p2p1p0 
These Group functions are now combined to form:G8:0=G8:6+P8:6G5:3+P8:6P5:3G2:0 
The other carries could be constructed in the following manner:G7:0=G7:6+P7:6G5:3+P7:6P5:3G2:0 G6:0=G6:6+P6:6G5:3+P6:6P5:3G2:0 G5:0=G5:3+P5:3G2:0 G5:0=G5:3+P5:3G2:0 G4:0=G4:3+P4:3G2:0 G2:0=g2+p2g1+p2p1g0 G1:0=g1+p1g0 G0:0=g0 
An improved prior art technique for determining the carry bits is the Ling method (H. Ling, “High Speed Binary Adder”, IBM Journal of Research and Development, Vol 25, No 3, pp 156-166, 1981). Ling observed a variation of the above, which allows for a small speed up on the parallel prefix method. He observed that if the delay of the carry term Gj:i could be reduced by increasing the delay of some other term, the overall delay will be reduced as long as the carry term is still on the critical path. Ling observed that every term inGj:i=gj+pjgj−1+pjpj−1gj−2+. . . +pjpj−1 . . . pi+1gi contains pj except for the very first term, which is simply gj. However, Gj:i can still be simplified by noting thatgk=pkgk 
Therefore pj can be factored out of Gj:i to create a pseudocarry Hj:i, whereGj:i=pjHj:i Hj:i=gj+Gj−1:i 
The function Hj:i is a little simpler than the function Gj:i. The fan-in of the OR gate for Hj:i and Gj:i is the same but the fan-in of each AND-gate is reduced by 1. This is illustrated in FIG. 4. Ling also observed that the pseudocarry Hj:i of a large group could be constructed from the pseudocarries Hj:k and Hk−1:i of smaller groups:
                                             H                          j              :              i                                =                                                    g                j                            +                              G                                                      j                    -                    1                                    :                  i                                                      =                                          g                j                            +                              G                                                      j                    -                    1                                    :                  k                                            +                                                P                                                            j                      -                      1                                        :                    k                                                  ⁢                                  G                                                            k                      -                      1                                        :                    1                                                                                                                    =                                    [                                                g                  j                                +                                  G                                                            j                      -                      1                                        :                    k                                                              ]                        +                                          P                                                      j                    -                    1                                    :                  k                                            ⁢                                                p                                      k                    -                    1                                                  ⁡                                  [                                                            g                                              k                        -                        1                                                              +                                          G                                                                        k                          -                          2                                                :                        i                                                                              ]                                                                                                  =                                    H                              j                :                k                                      +                                          P                                                      j                    -                    1                                    :                                      k                    -                    1                                                              ⁢                              H                                                      k                    -                    1                                    :                  i                                                                        
This provides a method for constructing the pseudocarry of a large group in terms of pseudocarries of smaller groups, which can be constructed from the pseudocarries of yet still smaller groups.
As in the parallel prefix case more than two pseudocarries can be combined to form the pseudo carry of a large group:
If the large group i to j is divided into 3 groups say, i to k′−1, k′ to k″−1, and k″ to j then:Gj:i=pjHj:i Hj:i=gj+Gj−1:i Gj−1:i =Gj−1:k″+Pj−1:k″Gk″−1:k′+Pj−1:k″Pk″−1:k′Gk′−1:i Gk″−1:k′=pk″−1Hk″−1:k′Gk′−1:i=pk′−1Hk′−1:i Hj:i=Hj:k″+Pj−1:k″−1Hk″−1:k′+Pj−1:k″−1Pk″−2:k′−1Hk′−1:i 
This method still suffers the same problems as the parallel prefix method, that is, more complex gates. Note that Hj:i has the form H2+P2H1+P2P1H0, which is exactly the same as that of the Group generate function G2+P2G1+P2P1G0 in the parallel prefix method, and higher fan-out is the also the same. Ling's method will now be illustrated by way of example.
The following is an example of a 9-bit Ling adder, which is illustrated in FIG. 5a. G8:0=G8:6+P8:6G5:3P8:6P5:3G2:0=p8H8:0 H8:0=H8:6+P7:5H5:3+P7:5P4:2H2:0 
The pseudocarry functions are:H8:6=g8+g7+p7g6 H5:3=g5+g4+p4g3 H2:0=g2+g1+p1g0 
Note that at the first level, the highest complexity function for Ling has the form H2+H1+P1H0, where as for parallel prefix this is G2+P2G1+P2P1G0.
But the complexity of H8:0 is the same as G8:0, both being of the form A+BC+DEF. One may try to combine P7:5P4:2 and thus reduce the complexity of the second level to A+BC+DE, butP7:5P4:2=P7:2=p7p6p5p4p3p2 which is an AND of 6 terms and generally slower to calculate thanH2:0=g2+g1+p1g0 
The Ling adder does have the problem that to produce the actual carry out the logical AND of pj and Hj:i needs to be formed which would impact the delay. This extra delay can however be eliminated by noting that the critical path for a n-bit adder is in producing the n−1 th bit which can be expressed as:
                                             S                          n              -              1                                =                                    a                              n                -                1                                      ⊕                          b                              n                -                1                                      ⊕                          G                                                n                  -                  2                                :                0                                                                                  =                                    a                              n                -                1                                      ⊕                          b                              n                -                1                                      ⊕                                          p                                  n                  -                  2                                            ⁢                              H                                                      n                    -                    2                                    :                  0                                                                        But pn−2 can be computed faster than Hn−2:0 and so a multiplexer can be used. This is shown in FIG. 5b. Sn−1=(an−1⊕bn−1⊕pn−2)Hn−2:0+(an−1⊕bn−1)Hn−2:0c 
Although Ling's method is better than the parallel prefix method, it nevertheless has a number of shortcomings. It parallelizes the computation of Gj:i as pjHj:i, but one of the functions, pj, is a very simple bit level propagate while the other function, Hj:i, is much more complex and so the parallelization is very limited. This parallelization, Gj:i=pjHj:i cannot be extended to more than two functions, that is no method is provided to parallelize Gj:i as XYZ etc. Ling's method allows for the speed of the first level only (compared to the parallel prefix method) and even this is very limited allowing for at most a reduction in the fan-in of the AND gates at the first level by at most 1. It offers no advantage over parallel prefix method when combining Group functions, in terms of the complexity of the gates and the fan out of Group functions.
The first drawback of Ling's approach is that although the carry function Gj:i=pjHj:i is broken down as a combination of two simpler functions, which can be computed in parallel, one of the functions is a very simple pj=aj+bj while the second is much more complex. Thus the impact on the delay in calculating the carry is very small.
A further prior art technique for generating carry bits is described in U.S. Pat. No. 5,964,827 (IBM Corporation). The IBM technique involves generating G3:0 by factorising p3p2 out of the expression for G3:0. The result is:G3:0=g3+p3p2[g2+g1+p1g0]=[g3+p3p2][g3+g2+g1+p1g0]
The function G15:0 is then determined using a similar factorisation involving a group function, giving:G15:0=[G15:12+P15:12P11:8][G15:12+G11:8+G7:4+P3:0G3:0].
The IBM method provides the advantage that the above factorisation reduces all AND gates to only two inputs. This is particularly useful in dynamic logic implementations because AND gates slow down significantly as the number of inputs is increased. Thus, the aim of the IBM idea is to reduce the number of inputs to a minimum for each AND gate. This can be achieved by combining only four bits at each level to produce a group generate function or a carry, and performing the above factorisation, in which each AND gate has only two inputs. In this type of technology, it is not as crucial to limit the number of inputs on an OR gate. However in the IBM method, the generate function is fully calculated at each stage by performing an AND operation between the two terms in brackets. This is unnecessary, and slows down the circuit.