1. Field of the Invention
This invention relates to an improved carry lookahead technique, used, for example, in an adder.
2. Description of Related Art
In binary place value addition two operands, represented in the binary place value code, are summed to produce a result, also in binary place value code. Each operand, and the result, is an array of bits. In the binary place value code, the position of a bit in the array determines its power of two weight; hence, value=sum (bit.times.2 pos). The bit with the least weight in determining the value of a representation is called the least significant bit, or LSB. Similarly, the bit with the greatest weight is called the most significant bit, or MSB. Other number systems have better properties for addition, but they suffer from other problems, so virtually every computer on the market today uses the binary place value code.
In the addition method done by hand, the sum is formed for the LSB position, a carry is possibly propagated to the next bit position, and the process is repeated for successive bit positions until each bit in the result has been calculated. This method is known as `ripple carry` addition to those skilled in the art, as shown in FIG. 1. Ripple carry addition is inherently slow because of its serial nature.
Traditionally, addition has been speeded up by using one of the following techniques, as described in CMOS VLSI Design, Addison-Wesley 1985: Carry look-ahead, Manchester Carry Lookahead, Binary Carry Lookahead trees, and Carry Select Addition. Another Technique, as described in the 5th Annual symposium on Theoretical Aspects of Computer Science, STACS 1988, is Condition Sum Addition. Still another techniques is the Multiple Output Domino Logic adder described in the 1988 Solid-State Circuits Conference Digest of Techniques Papers.
Each of these adders, except the Carry Select, and the Conditional Sum Adder, is based on the concepts of carry propagation, carry generation, and carry kill (see FIG. 2); which are well understood by people skilled in the art. A carry is said to `propagate` through a bit position if a carry into the summation operation for a given bit position is followed by a carry out of the given bit position, while the summation for the given bit position does not produce a carry out when no carry is input. A carry is said to be `generated` from a given bit position if the summation for the given position produces a carry out independent of carry in. A carry is said to be `killed` in a bit position if a carry does not propagate through the bit. As known by people skilled in the art, and shown in FIG. 2, the propagate and generate signals may be found without actually doing the summation.
A grouping of adjacent sum functions, adjacent propagate functions, or adjacent generate functions, may be called a `block`, an example of which is shown in FIG. 1. The terms propagate, generate, and kill, may be applied to blocks. A carry is said to propagate through a given block if a carry into the given block's LSB summation is followed by a carry out of the given block's MSB summation. A block is said to generate a carry if the said block's MSB summation produce a carry out, independent of carries into the block's LSB. It is known by anyone skilled in the art that a block can only generate if a given bit position in the block generates and all bits between the given bit and the block's MSB propagate, as done in the Manchester Carry Chain described in CMOS VLSI, Addison Wesley, 1985.
An overview of carry look-ahead adder theory is given in Digital Computer Arithmetic, by J. J. F. Cavanagh, p. 107-117, McGraw-Hill (1984). The CLA speeds computation of the carries by using redundant logic. A block is defined for a given bit position such that the LSB of the block corresponds to the LSB of the add, and the MSB of the block corresponds to the given bit position. Hence, there are as many blocks as there are bit positions in the result minus 1, as shown in FIG. 3. An additional block is needed if a carry out is required. Each block has only the input operands and the carry into the add as inputs, and one carry as an output. Theoretically, any single logic function may be performed in two gate delays, so the time to produce a sum in a carry lookahead adder is theoretically a constant four gate delays; however, the finite gain of real gate limits the amount of load that any one gate can drive in a given amount of time, so more gate delays must be added. Also, since the input operands themselves are driven by gates, the number of carry blocks that can be driven, and hence the number of bits the result may have, is also limited. The carry lookahead adder becomes relatively slow for results sizes beyond a few bits. The performance of the carry lookahead adders shows that loading considerations of real gates determines add time, not the theoretical number of gate delays. A recent adder invented by Ling, presented in the IBM Journal of Research and Development, May 1981, reduces the load from the output of the propagate and generate state by one gate input per bit, per block. Although this method alleviates the loading problem, the CLA adder remains relatively inefficient for adds of more than a several bits.
The Manchester Carry Lookahead adder, FIG. 4, speeds addition by allowing carries to skip blocks that propagate. To apply this method, an add is broken into a series of blocks, such that no carry propagation logic is duplicated. Then, all of the bit propagates for a given block are ANDed together to determine if the given block will propagate. If the block will propagate, then a bypass is turned on which will route any carries into the LSB of the block directly to the output of the block's MSB. This method works well for certain size adders in CMOS; however, its performance is still linearly related to the size of the add. In attempts to alleviate this problem multiple levels of skip are added, as explained in the IEEE Transactions on Computers Volume 36. For adds greater than 32 bits, this method can only approach the speed of a Binary Lookahead Carry adder, and it will be significantly slower than the adder being presented in this application because of theoretical and practical reasons.
A binary tree is a special graph composed of nodes and arcs, similar to a family tree. The one node at the top has no predecessors, it is called the `root` node. Each node in the tree has one or two `children`, except for the `leaf` nodes which have no children (corresponds to the youngest generation). Circuits with special properties may be built with gates as nodes. In 1980 it was shown that an `o` operator could be defined which would allow carries in an adder to be implemented in a binary tree [CMOS VLSI design]. However, the binary carry lookahead tree could only provide carries for bit positions of powers of 2 (i.e., 2, 4, 8 . . . ), so an `inverse` tree had to be employed to derive the inbetween carries, which causes more overhead.
The Binary Lookahead Carry tree, FIG. 5 has gate delays related to the log base 2 of the add length, plus overhead. As in the case with the carry lookahead adder, circuit loading prevents the realization of the log base 2 gate delays for large trees, i.e., trees for adds bigger than about 8 bits. Also, as is known to anyone skilled in the art, the tree becomes large and inefficient to layout for large adds. The MODL gate adder, described in the 1988 IEEE Solid-State Circuits Conference Digest of Technical Papers, is an attempt to alleviate some of these problems and allows for a log base 2.times.linear performance (less than log base 2) for larger adds.
A Carry select adder, shown in FIG. 6, is based on the principle of partitioning the add into three blocks. The first block adds the bottom half of two operands. The second block adds the top half of the operands while assuming a carry in from the first block of zero. The third block adds the top half of the operands while assuming a carry in from the first block of one. When the carry from the first block is calculated it is used to pick the correct top half of the result by selecting the sum out of block two, or the sum out of block three via a two to one mux. When carry select adders are strung out in series they have linear performance related to the number of sections, plus the add time of the first section. For large adds, this type of adder is relatively slow, but it is small.
The Conditional Sum Adder is a recursively applied Carry Select Adder. An add is broken into many redundant blocks of two, then one set is picked so that only half as many possibly sums remain. The process of halving the possible sums is continued in multiple stages until only the result remains. This adder has a log base 2 performance; however, it is even larger than the BLC adder. Since the summation is calculated along with the carries in each stage, the summation logic is unnecessarily reproduced many times. Also, the summation overhead makes this adder slower than the BLC adder.
Because adders form the heart of many digital circuits, and they are a major contributor to the required cycle time of RISC microprocessors, there has continued to be a need for faster adders, as provided by the present invention.