1. Field of the Invention
The present invention generally relates to digital logic circuits, and more particularly to high-speed adders used in arithmetic logic units, such as execution units in a microprocessor or address generators of a computer system.
2. Description of the Related Art
Adder circuits are fundamental building blocks in all microprocessor designs. An adder, as suggested by its name, simply adds two binary numbers. Adders are used in a wide variety of arithmetic logic units such as execution units of a microprocessor, including fixed-point (or integer) units. Adders are used not only for addition operations, but also in multipliers which function by performing multiple add and shift operations. Adders are used in other areas of a conventional computer system besides the main processor, for example, in computing physical or logical addresses for memory fetch operations. Furthermore, adders are used in many other special-purpose digital systems, e.g., telecommunications systems, where a general-purpose computer would be superfluous.
Several types of adders are widely known, including ripple carry adders, carry lookahead adders and carry-save adders. Carry lookahead and carry-save adders are fast, but larger and consume much more power than ripple adders. They are based on the usage of a carry tree that produces carries into appropriate bit positions without back propagation. In order to obtain the valid sum bits as soon as possible, the sum bits are computed by means of carry-select blocks which are able to perform their operations in parallel with the carry-tree.
Carry lookahead schemes are common in the industry for the design of adder circuits that avoid the need to wait for a carry at the first stage to serially propagate to the most significant bit of the sum output. A typical 64-bit carry lookahead adder 10 is illustrated in FIG. 1, and includes carry lookahead (CLA) logic 12 and sum logic 14. Sum logic 14 is partitioned into 16 functional blocks that each receive a pair of sets of operands (four bits per block) and a carry-in bit. The operands to each block determine whether a carry-output is generated within the block, and whether the block is to propagate the carry-input value to the carry-output value. The collection 16 of all outputs from the blocks in sum logic 14 is the result of the addition operation.
Each sum logic block can compute both true and complement results as two separate operations in parallel, i.e., one for the case where the carry-in signal is “0” and one for the case where the carry-in signal is “1.” One of these two results (true and complement) is then selected for output based on the real carry signal, once it has developed. This design is further shown in FIG. 2 which depicts the operation of one of the sum logic blocks 14a. 
Block 14a includes first ripple carry logic 18 which computes a half-sum assuming that the carry bit is set to zero, and second ripple carry logic 20 which computes the half-sum assuming that the carry bit is set to one. Each of ripple carry logic 18 and 20 is implemented as a ripple adder and receives four bits from each operand, e.g., A(0:3) and B(0:3). Sum output completion logic 22 and 24 finishes the local sum operations to generate a carry-0 sum (S0) and a carry-1 sum (S1). Those values are passed to a 2:1 multiplexer 26 which is controlled by the true carry signal from CLA block 12a to output the appropriate sum bits to the result bus.
Although the design of FIG. 2 requires twice the complexity for the sum logic, it allows the true and complement sums to be generated in an overall faster manner, which is increasingly important as designers attempt to achieve ever-higher computation speeds. The speed of a carry-lookahead adder is generally bound by the speed of the carry-generation and propagation process. In the example of FIG. 2, the critical path for generating the final, correct sum bits includes the carry lookahead logic 12a and the multiplexer 26, i.e., this path has the longest delay of any circuit through the generate/integrate block.
A good adder design will try to balance the delays in the sum logic and in the CLA logic. If the sum logic is faster, it can be detuned to save power or area (by adjusting the types of CMOS devices or their sizes), because the overall delay is still determined by the CLA logic. In addition to achieving this balance, additional functions can be added to improve the cycles per instruction (CPI) of the machine as long as they do not cause a delay penalty over the usual addition time.
Since the CLA is the critical path of the adder, these functions can only be located in the sum logic. Traditionally, these functions are gated directly with the carry from the CLA chain, and the sum logic (if implemented as a ripple adder) can become slower than the CLA delay. For example, it is often desirable to invert the result of the adder, or force the output of the adder to all 1's. Two control signals can be provided for these features, a force_1 control signal and an invert control signal, and control logic 28 is inserted in the sum logic to implement this functionality. This control logic, however, introduces further delay to the sum path, and to keep the sum delay smaller than the CLA delay, it becomes necessary to use aggressive local CLA logic 30 within the sum logic to contain the delay. This approach can unduly increase the area and power of the adder design (a problem which is only exacerbated as the adder size grows), and can even lead to the sum logic being faster than the CLA delay when this is not desired.
In light of the foregoing, it would be desirable to devise an improved adder design which could allow for features such as inversion or forcing 1's without introducing a delay in the sum logic. It would be further advantageous if the design could still conserve area and power.