1. Field of the Invention
This invention relates to the field of electronic circuit design, and in particular to the design of a CMOS adder circuit.
2. Description of Related Art
In a conventional ripple-adder, each adder stage propagates a carry to the next stage, thereby propagating a carry from a first stage to the last stage in a serial, and therefore slow, fashion. Because a carry from a first input stage could affect the final state of the most significant bit of the sum output, the sum output is not considered valid until sufficient time is allowed for the possible serial propagation of a carry from the first adder stage.
Carry-lookahead-generation schemes are common in the industry for the design of adder circuits that avoid the need to wait for a carry at the first stage to serially propagate to the most significant bit of the sum output. The adder is partitioned into functional blocks that each receive a pair of sets of input bit-values and a carry-input bit value. The input bit-values to each block determine whether a carry-output is generated within the block from the input bit-values, and/or whether the block is xe2x80x98sensitizedxe2x80x99 to propagate the carry-input value to the carry-output value. Consider, for example, a single-bit adder block with inputs a, b, and carry-in. If a and b are both a logic 1, a carry-out is generated (independent of the value of carry-in). Likewise, if either a or b is a logic 1, the value of the carry-in, c, is propagated to the carry-out (if the carry-in is a logic 1, the carry-out will be logic 1). If, on the other hand, both a and b are at logic 0, a carry-out is not generated, and a carry-in is not propagated (carry-out will be logic 0, independent of the value of carry-in)
FIG. 1 illustrates an example block diagram of the carry generation and propagation logic portion of a conventional carry-lookahead adder. In the example of FIG. 1, the adder is partitioned into four-bit functional blocks. A four-bit carry generate/propagate block 110a receives a pair of the lower-order four-bits of two input arguments, A[3:0], B[3:0], and, based on these inputs, determines whether a carry-output (C3) is generated, gC3-0, by the lower-order (3:0) input-bits. This block also determines whether the carry-input (Cin) is propagated, pC3-0, to the carry-output (C3). Similar four-bit generate/propagate blocks 110b-d are configured to determine whether a carry-out is generated, gC7-4, gC11-8, etc., by the corresponding set of inputs {A[7:4], B[7:4]}, {A[11:8], B[11:8]}, etc., and whether each corresponding carry-in is propagated, pC7-4, pC11-8, etc.
The generate-carry and propagate-carry signals from each block are combined, in group carry generate/propagate blocks 120a-b, to determine whether a carry-out signal from each group is generated, gC7-0, gC15-8, from within the group, {[7:4]-[3:0]}, {[15:12]-[11:8]}, and whether the carry-in signal (Cin, C7) to each group is propagated, pC7-0, pC15-8, through the group. In like manner, the group generate and propagate signals are used, in the group generate/propagate block 130, to generate a higher-level group generate gC15-0 and propagate pC15-0 signals. FIG. 2 illustrates the generation of group-generate and group-propagate signals from the generate-Carry and propagate-Carry signals from a pair of stages (upper-order-stage and lower-order-stage) that form the group. A carry-out signal is generated within the group if the carry is generated within the upper stage, or, if the carry is generated within the lower stage, and the upper stage is sensitized to propagate the carry that is generated within the lower stage. The group is sensitized to propagate the carry-in signal that is received by the group if both the lower-stage and the upper-stage are sensitized to propagate the carry input to each of the stages.
Note that, with these group generate signals being provided, the higher order carry signals can be easily generated, based only on the value of in the carry-in signal, if any, to the adder, and the values of the generate-carry and propagate-carry signals. For example, if gC15-0 (generate carry within the group of bits 0 through 15) is logic 1, then the carry-output C15 of the group [15:0] will be logic 1; or, if pC15-0 (propagate carry-in through the bit 0-15 stages) is logic 1, and the carry-in Cin is logic 1, then the carry-output C15 will be logic 1; otherwise, unless the generate-carry gC15-0 signal a logic 1, C15 is logic 0. This optimization can be extended to higher order sets of bits [31:0], [63:0], and so on. In like manner, intermediate carry-out values, C11, C19, and so on, can be easily generated as illustrated in FIG. 3.
A pair of sum-output values is determined based on the inputs to each block in the adder, as illustrated in FIG. 4. A conditional sum determinator 210 determines a first sum S|C=0 as the sum of the inputs A, B to the block, if the carry-in to the block is logic 0, and a second sum, S|C=1 as the sum of the inputs to the block if the carry-in to the block is logic 1. That is, each sum is determined, independent of the actual carry-in to each block. When the carry-in to each block is determined, via the example circuit of FIG. 3, the corresponding sum S|C=0 or S|C=1 is selected, via the selector 220 associated with each block.
The speed of a carry-lookahead adder is generally bound by the speed of the carry-generation and propagation process. FIG. 5A illustrates an example logic diagram for a four bit generate/propagate block, such as might be used for each of the blocks 110a-d of FIG. 1, and FIG. 5B illustrates an example equivalent logic diagram for a four bit generate/propagate block 110xe2x80x2 that is optimized for speed, using DeMorgan""s laws of equivalence. The block 110xe2x80x2 of FIG. 5B is formulated from the logic of block 110 of FIG. 5A into sets of AND-AND-NOR gates 310-320-330, and sets of OR-OR-NAND gates 340-350-360, using DeMorgan""s laws of inverse functions.
As is known in the art, AND-AND-NOR gates and OR-OR-NAND gates can each be formulated as a single-stage complex gate, as illustrated by the CMOS complex gates of FIGS. 6 and 7, respectively. As illustrated in FIG. 6, if inputs A, B, AND C are logic 1, OR, inputs D AND E are logic 1, the output F will be a logic 0; otherwise, the output F will be a logic 1. As illustrated in FIG. 7, if either A, B, OR C are logic 1, AND, either D OR E are logic 1, the output F will be a logic 0; otherwise, the output F will be a logic 1. The use of a complex, or matrix, gate to effect the AND-AND-NOR (or OR-OR-NAND) function avoids the sequential delay of first determining the results of the AND (or OR) functions and then determining the results of the NOR (or NAND) function.
The speed of a complex gate is generally determined based on the time required to discharge or charge the output node F to ground or power potentials, respectively. The discharge time is determined by the longest serial path to ground through the N-channel devices of the matrix gate, and the charge time is determined by the longest serial path to power potential through the P-channel devices.
As is known in the art, a P-channel device is inherently slower than an equal sized N-channel device. As also known in the art, an increase in the gate size of a device increases the capacitive load on the device that is driving the gate, thereby increasing the power consumption and further decreasing the speed of the device unless the device that is driving the gate is also increased in size. Therefore, for the same area and power constraints, a series of N-channel devices will be faster than an equivalent series of P-channel devices. Or, alternatively stated: for the same speed constraints, a series of N-channel devices will be smaller and consume less power than an equivalent series of P-channel devices.
In FIG. 7, the series connection of the P-channel gates that are gated by signals A, B, and C form the longest series path, with a series length, or xe2x80x9cstack depthxe2x80x9d of three P-channel devices for bringing the state of node F to the power potential. The N-channel stack depth, or maximum series length, for discharging the node F to ground potential is two N-channel devices, one of the three N-channel devices that are gated by signals A, B, and C, and one of the two N-channel devices that are gated by signals D and E. Therefore, the maximum delay of the matrix OR-OR-NAND structure of FIG. 7 is the sum of the delay through the three series P-channel devices.
In FIG. 6, on the other hand, the series connection of the N-channel gates that are gated by signals A, B, and C form the longest series path, of three N-channel devices, for discharging the state of node F, and the longest series path for charging node F is two P-channel devices. Therefore, the maximum delay time for forming an output at the node F is determined as the maximum delay of three N-channel devices, or two P-channel devices. With the inherent slower speed of P-channel devices compared to N-channel devices, these N and P series delays may be similar, but in either event, the delay of the AND-AND-NOR structure of FIG. 6 is less than the delay of the OR-OR-NAND structure of FIG. 7, for similar area constraints.
As demonstrated by the example structures of FIGS. 6 and 7, in a series of devices that form a critical path, an embodiment that reduces the P-channel stack depth, even at the cost of a corresponding increase in the N-channel stack depth, will be more efficient in terms of power, speed, and/or area than an equal-length series of devices with a larger P-channel stack depth.
In the example of FIG. 5B, the critical path for forming the generate-carry signal includes the AND-AND-NOR gate 310-320-330 and the OR-OR-NAND gate 340-350-360, and this path can be shown to be the longest delay path through the generate/integrate block 130 of FIG. 1, because of the three P-channel devices in series in the OR-OR-NAND gate 340-350-360.
It is an object of this invention to improve the speed of a carry-lookahead adder. It is a further object of this invention to reduce the P-channel stack depth within critical paths of a carry-lookahead adder.
These objects and others are achieved by providing a carry-lookahead adder that is configured to generate and propagate a null-carry signal within and through blocks and groups of blocks within the adder. A null-carry signal is a signal that terminates the effects of a carry input to the block or group of blocks beyond the point at which the null-carry signal is generated. By forming rules for generating and propagating null-carry signals through blocks and groups of blocks within the adder, a maximum P-channel stack depth of two can be achieved for a four-bit adder block, thereby substantially improving the speed of the carry-lookahead adder, compared to a convention carry-lookahead adder that is based on generating and propagating carry signals within the adder.