Binary summation (i.e., "addition") is one of the most important arithmetic operations performed by general-purpose and application specific processor systems (e.g., digital signal processors). This is because arithmetic summing operations are essential not only for addition, but also for subtraction, multiplication and division since these operations typically include repetitive summation steps. Accordingly, the speed of microprocessors and other general-purpose arithmetic processors are heavily dependent on the speed of the adder circuits contained therein.
Early microprocessor systems made use of classical adder designs, such as the ripple adder of FIG. 1, which is a reproduction of FIG. 2.2 from the textbook by J. Cavanagh, entitled Digital Computer Arithmetic, McGraw Hill, Inc. (1984), the disclosure of which is hereby incorporated herein by reference. Ripple adders are simple in design, require little electrical power and are easy to implement using conventional hardware, however, they are typically slow in their operation. This is because ripple adders have relatively long propagation paths extending from the least significant bit to the most significant bit position of the adder. Thus, a carry signal ("C") is propagated in a time proportional to the size of the adder and hence, the size of the binary operands being summed. As will be understood by those skilled in the art, the sum ("S") of two binary operands B1 and B2 of length N can be obtained using the following well known relationships: EQU S.sub.i =B1.sub.i .sym.B2.sub.i .beta.C.sub.i EQU C.sub.i+1 =B1.sub.i B2.sub.i B1.sub.i C.sub.i B2.sub.i C.sub.i =B1.sub.i B2.sub.i P.sub.i C.sub.i
where, C.sub.0 =0; i=(0,1,2,3, . . . ,N}; P.sub.i =B1.sub.i .sym.B2.sub.i ; .sym. is the XOR function; and is the OR function. Accordingly, if the propagation delay for each full-adder cell i is ".tau." the amount of time required to add two N-bit operands using a ripple adder is approximately N.tau..
Many attempts have been made to increase the speed of arithmetic operations performed by general-purpose processors, based on a strategy of reducing the delay associated with carry propagation. One such attempt, commonly referred to as "carry-lookahead", is based on the principle that the carry-in signals for one or more higher-order adder stages can be generated directly from the inputs to the preceding lower-order stages without waiting for the carry-in signals to ripple through those stages. Adders designed using this technique are commonly referred to as "carry-lookahead adders" (CLA). An exemplary CLA, including circuitry for generating group-propagate and group-generate signals, is shown in FIG. 2. FIG. 2 is a reproduction of FIG. 2.5 from the aforementioned Cavanagh textbook.
As shown in FIG. 2, a conventional CLA looks at corresponding bit groups of two binary operands and generates a carry-out signal to the next higher order bit groups while the addition of the corresponding bit groups is performed to derive a sum. Thus, the generation of the carry-out signal occurs in parallel (i.e., simultaneously) with the generation of the sum bits. The lookahead circuitry reduces the need for rippling through every bit position and can reduce processing time to a value substantially below N.tau.. There is, however, an area penalty caused by the additional lookahead circuitry. As will be understood by those skilled in the art, group propagate, group generate and the carry-out signal for a four-bit group can be provided by circuitry which performs the following logic functions: EQU p.sub.3:0 =p.sub.3 p.sub.2 p.sub.1 p.sub.0 EQU g.sub.3:0 =g.sub.3 p.sub.3 g.sub.2 p.sub.3 p.sub.2 g.sub.1 p.sub.3 p.sub.2 p.sub.1 g.sub.0 EQU C.sub.4 =g.sub.3:0 p.sub.3:0 C.sub.in
where C.sub.in is the carry-in to the four-bit group.
Another known adder design for increasing the speed of binary summation is shown in FIG. 3, which is a reproduction of FIG. 2.10 from the aforementioned Cavanagh textbook. This adder includes pairs of group adder stages, as shown. One of each pair performs summation operations assuming a carry bit from the preceding stage and the other performs summation operations assuming the absence of a carry bit from the preceding stage. Group propagate and group generate signals, not shown, are also generated to derive the group carry bits GC.sub.0, GC.sub.1, GC.sub.2, GC.sub.3, as shown. The adder of FIG. 3 is commonly referred to by the acronym CSLA, because it combines features of conventional carry-select and carry-lookahead adders.
The carry-lookahead adder of FIG. 4 is disclosed in U.S. Pat. No. 4,737,926, entitled Optimally Partitioned Regenerative Carry Lookahead Adder, to Vo et al. FIG. 4 is a reproduction of FIG. 5 from the Vo et al. patent, which is hereby incorporated herein by reference. FIG. 4 shows a 32-bit full adder 60 arranged in a cascaded ripple fashion with bit-0 adder 50 being the least significant bit (LSB) adder and bit-31 adder 65 being the most significant bit (MSB) adder. Each bit adder 61 includes a circuit for generating propagate and generate signals (not shown) to its respective lookahead carry generation block 67. Each lookahead block 67 is arranged in a cascaded fashion so as to accept a carry-in from the previous block and generate a carry-out to the next subsequent block.
The bit adders 61 are arranged in irregular groupings to reduce the time associated with the propagation of the carry from the LSB adder to the MSB adder. The grouping sequence is arranged by length from bit-31 to bit-0 as: {3 4 5 6 5 4 3 2}, with the smallest bit groupings being at the least significant and most significant bit positions. However, because of the cascaded arrangement, the propagation of the carry must still proceed serially through the blocks. As will be understood by those skilled in the art, the worst case propagation path extends from the second bit position (reference 53) to the last bit position (reference 54). The path includes bit stage 1, lookahead blocks 2 through 7 and bit stages 29 and 30. Accordingly, the adder of FIG. 4 has a worst case delay of T=2B+6L+1B, where B is the bit stage delay and L is the lookahead block delay. The speed of the Vo et al. 32-bit adder is therefore limited by the serial propagation of the carry through the 6 intermediate blocks.
Other attempts to design fast adders include the carry-skip adder disclosed in an article by A. Guyot, B. Hochet and J. Muller, entitled A Way to Build Efficient Carry-Skip Adders, IEEE Transactions on Computers, Vol. C-36, No. 10, October (1987). These adders comprise simple ripple adders with a plurality of speed-up carry chains (skip chains). The skip chains provide the feature whereby a carry into a block of full-adder cells can be bypassed to the next high order block if all the bits to be added in the block are different (i.e., if p.sub.i =1 for all the cells in the block).
Finally, FIGS. 5A and 5B illustrate a 56-bit adder used in the Advanced Micro Devices Am29050 microprocessor. The adder is described as a redundant cell carry-lookahead adder and is disclosed in an article by T. Lynch and E. Swartzlander, Jr., entitled "A Spanning Tree Carry Lookahead Adder, IEEE Transactions on Computers, Vol. 41, No. 8, August (1992). The adder uses a "tree" of 4-bit Manchester carry-chains ("Mcc"), having intermediate outputs, to generate carry signals into bit positions: 8,16,24,32,40,48 and 56. FIG. 6 schematically illustrates a 4-bit Mcc having intermediate outputs (p.sub.1:0, g.sub.1:0) and (p.sub.2:0, g.sub.2:0).
The adder also comprises pairs of 8-bit ripple adders for performing summation of 8-bit groups of the 56-bit binary operands to be summed. To achieve the carry-in signals at 8-bit intervals, the adder uses overlapping groups of carry-propagate and carry-generate signals, generated at the second and third tree levels, hence the term "redundant". These overlapping groups are generated at the intermediate outputs of the carry-chains. As will be understood by those skilled in the art, the use of carry-chains having intermediate outputs causes additional delay to the generation of the carry-in signals by providing additional loading to the higher level chains in the tree. Moreover, by using carry-chains of uniformly 4-bit length, the critical paths associated with the summation of each of the 8-bit groups of the 56-bit operands are of relatively nonuniform length. Thus, the sum bits for each of the consecutive 8-bit groups are not generated in the same amount of time.
Accordingly, notwithstanding the above-mentioned adder designs, there continues to be a need for fast binary adders, which are scalable and which have uniform carry-propagation delay times for performing carry-select and for generating groups of sum bits.