The present invention pertains in general to digital arithmetic circuits, and particularly to integrated circuits which include adder or multiplier blocks with ripple-mode carry architecture.
Digital arithmetic circuits need the capability for fast addition and multiplication. A key point of high-speed addition or multiplication is the carry operation. For example, one of the critical factors in the speed of almost any Digital Signal Processing (DSP) chip is the carry chain in the multiplier. Addition can also be a speed-limiting factor in simpler integrated circuit portions, e.g. in an ALU (arithmetic logic unit) or even in a program counter. Thus, improvements in the speed of the carry operation may have a significant impact on the overall performance of many numeric computing systems.
Several architectures are known for digital adders. See, e.g., U.S. Pat. Nos. 3,947,671, 4,338,676, and 4,623,981, which are hereby incorporated by reference. In such architectures, much of the arithmetic can be performed in parallel, e.g. by separate circuits which sum corresponding bits of the two numbers which are being added together. However, the carry operations are not as amenable to parallel operations. For example, to determine whether a carry-in bit must be added in at the 15th bit position, it is necessary to look at the carry-out result from the 14th bit position; and that result may depend on the carry-out result from the 13th bit position, and so on. For example, if 1 is being added to 1023, these numbers would appear, in 16-bit binary notation, as ##EQU1## In this operation, the carry-in bit value at bit position 11 (where the 1 appears in the number "0000010000000000") is known only after the carry operations have been computed at bit positions 10, 9, 8, 7, 6, 5, 4, 3, 2, and 1. People doing long addition by hand will normally work from the right-most (least significant) digits over to the left-most (most significant), handling carries as they go. However, for computer operations, this would be impossibly slow. Therefore, a primary challenge to any adder architecture is to handle the carry operations rapidly.
The present invention is particularly applicable to systems which use a ripple-mode carry chain architecture. In this architecture, a simple electrical circuit is used to link the carry inputs and outputs of all the one-bit stages. Since all the one-bit stages are "chained" together, this architecture is referred to as a "carry chain" architecture. the propagation of carry information along the electrical circuit is not clocked, so the carry computations can be resolved as fast as the electrical properties of the circuit permit.
Each of the one-bit stages along the carry chain includes logic to use the carry-in bit, and logic to propagate the appropriate carry-out bit along the chain. That is, at a given bit position:
if the operand bits are "0+0", the carry-out bit must be "0", regardless of the carry-in bit;
if the operand bits are "1+1", the carry-out bits must be "1", regardless of the carry-in bit; and
if the operand bits are "0+1" or "1+0", the carry-out bit must be the same as the carry-in bit, and therefore the carry-in bit can simply be propagated along the chain to the next stage. Note that none of these cases requires any logical operation to be performed on the carry-in bit. Thus, it is possible to perform all of the carry computations asynchronously (ideally, within one clock cycle).
To perform the full add, it is only necessary to: (1) sum the operand bits at all of the stages simultaneously, to obtain a preliminary result bit and a correct setting for the carry propagation logic; (2) propagate the carry signals, and allow enough time for a carry signal to propagate all the way down the carry chain; and (3) modify the preliminary result bit, in accordance with the carry-in bit, to provide a data output bit. If step 2 (the carry propagation) can be done in one clock phase, then the full addition can be performed in only three phases.
The carry propagation logic is implemented using pass gates (which are used to control propagation of the carry-in bit along the carry data line) and latches (which are used to capture the carry-in bit data). The latches will also provide an active load to the carry-in line, since, at each stage where the carry-in bit does not match the latch's state, the transistors in these latches will tend to fight the carry-in bit signal until the latch changes state.
The layout of the carry chain often has to be quite strung out, since the pitch of the carry chain will be determined by the pitch of the preceding stages. For example, in a multiplier, the spacing of the stages in the carry-chain logic might be determined by the pitch of the preceding shift and add blocks. Thus, the distributed capacitance of the carry line may be significant.
Electrically, the carry chain behaves almost like a transmission line: it has a significant distributed series resistance (from the pass gates), and also a large distributed load (current loading from the latch feedback, as well as capacitive loading from the distributed capacitance of the line and the input capacitance of transistor gates.) The delayof such a circuit is dependent on three factors: increased series resistance increases delay; increased loading increases delay; and increased effective length (e.g. from an increased number of stages in the carry chain) increases delay.
As integrated circuits have been scaled, and their operating speeds increased, the electrical delay of the carry chain has become a more significant factor. Note that the delay which must be allowed for in clocking is the worst-case electrical delay, i.e. the electrical delay required to propagate a carry signal all the way through the carry chain. Thus, even if this worst-case delay does not occur very often, it may require adding an additional clock cycle into every operation. Therefore, it would be desirable to increase the speed with which a carry signal propagates along the carry chain.
The present invention provides faster carry chain operations. The circuit features which accomplish this may be considered, electrically, as adding some distributed positive feedback during the time when carry signals may be propagating along the carry chain. (This positive feedback has the effect of decreasing the net loading on the transmission line. The current provided by this positive feedback is of opposite sign to the current sources which load the line initially.)
Within the general context of digital arithmetic circuits using carry chain architectures as described above, the present invention uses different logic at each stage of the carry chain. To clarify some of the advantages resulting from these innovative differences, the circuit configuraton of a conventional carry chain structure will now be described in greater detail.
FIGS. 1 and 2 schematically show a carry chain structure. Each stage has a data latch 16, a carry latch 20, a pass transistor 14, and a precharge transistor 26. Each latch 20 includes two static inverters, coupled back-to-back; the first inverter 24, whose input is connected to the carry line 12, will be referred to here as the "feedforward" inverter, and the second inverter 22, whose input is connected to the output of the first inverter and whose output is connected to the carry line, will be referred to as the "feedback" inverter. While the preceding stages (not shown) are performing arithmetic computations to define a preliminary result bit, the precharge transistors 26 pull up the inputs to each of the latches 20 to the supply voltage. Next, some of the pass transistors 14 are turned on (determined by the preliminary result bit), and the carry signal is propagated down the chain as far as permitted by pass transistors 14. (In FIG. 2 the carry signal is shown as originating in an NMOS transistor 28. In a full adder, each stage must be able to originate a carry signal, but the complete circuitry to do this is not shown in FIG. 2.) After carry propagation has occurred, an XOR circuit (not shown) combines the carry-in bit (from the output of the latches 20) with the preliminary result bit, to define the data-out bit.
Two significant contentions must be considered here. First, when the precharge cycle occurs, the PMOS precharge transistors 26 will have to overpower the NMOS pull-down transistors in the feedback inverters 22 of at least some of the carry latches 20. To facilitate this (and avoid excessive delay or power consumption during the precharge cycle), the pull-down transistors in the feedback inverters 22 are made relatively small. Secondly, if the incoming carry signal is low, it will have to overcome the PMOS pull-up transistor in the feedback inverter 22. Therefore, both of the devices in the second inverter would conventionally be made fairly small.
In the present invention, the carry latch 20' is not a static latch. The feedback inverter 22' is gated, so that it is only active after the precharge cycle. This inverter therefore be used to provide positive feedback onto the carry line.
This change in the operation of the latch 20' is combined with several significant changes in device ratios. First, the pulldown (NMOS) transistor in inverter 22' is made relatively large. Second, the feedforward inverter 24' is given a relatively high threshold voltage, so that the carry latches 20' switch and (begin to strengthen the carry signal) before the voltage on carry line 12 falls very far. Third, the precharge transistor 26 does not have to be made any larger than the layout conveniently permits, since it does not have to fight any active device during precharge.
Another aspect of the innovative teachings contained herein is that the carry latches 20' preferably have asymmetrical switching characteristics. These latches switch very readily from high to low (i.e. when an incoming carry signal is received), but they do not have to switch readily from low to high.
Therefore, when the carry signal is propagating, each latch 20' loads the carry signal (by sourcing current to the carry line) only until that latch's input node has been pulled down to about 3 Volts. At that point, the latch switches, and begins to sink current from the carry line. This current strengthens the carry signal, and helps to pull down the input node of the following stage (if the pass transistor is on).
This circuit organization is useful in several contexts. It provides a compact and fast full adder circuit. It is also useful in a general-purpose arithmetic/logic unit (ALU). In this context, such a carry chain structure is useful to provide the carry capability for rapid addition or subtraction, and can also be used for analogous linkage of stages in other arithmetic operations. This circuit organization is also useful in an incrementer, where an operand is (selectably) incremented or decremented. Such an incrementer is particularly useful in a program counter. In addition, this circuit organization can also be adapted to other hardware for digital arithmetic, or for other algorithms where low-level sequential branching must be performed.
The present invention provides faster performance in microprocessors (and other integrated circuts), by providing faster adder and multiplier circuits. The present invention provides faster carry chain operation in circuits which use ripple-mode carry.