One kind of function performed in programmable logic devices is arithmetic. A device such as a configurable logic array of Xilinx, Inc., assignee of the present invention, can perform arithmetic as well as a multitude of other logic functions. Such devices are described in U.S. Pat. Nos. 4,870,302 and 4,706,216, and U.S. Pat. No. 5,343,406, which are incorporated herein by reference. Because these devices are intended for general purpose functions, arithmetic is relatively slow and requires a significant amount of silicon area.
Other programmable logic devices, such as the programmable array logic device described in Birkner, U.S. Pat. No. 4,124,899 and user programmable devices described in Elgamal et al, U.S. Pat. No. 4,758,745 can also be programmed to perform arithmetic. These two patents are also incorporated by reference. In these devices the speed of performing arithmetic and other functions which use carry logic is limited by propagation of the carry signal. Also, the general purpose logic used to implement the carry function is significant.
For understanding how logic devices perform arithmetic, and particularly what causes delay, the following discussion of arithmetic functions will focus on adders. However, the discussion can easily be extended to apply to subtractors, incrementers, decrementers, and accumulators, in addition to other circuits which use a carry-logic.
The following discussion will focus on operation of the middle stages in a multi-bit adder. The least significant bit is a special case because there can be no carry signal to be received from a less significant bit. The most significant bit is a special case because the carry bit can be used for determining an overflow condition. These two special cases will be discussed in detail later.
By reference to FIGS. 1a, 1b and 2, it will be explained how the speed of a single bit ripple carry adder (FIGS. 1a and 1b), and thus a multi-bit ripple carry adder constructed by cascading single bit adders (FIG. 2) is constrained by the speed at which the signal at the carry-in terminal is propagated to the carry-out terminal.
The Boolean logic equations governing the behavior of the single bit adder shown in FIG. 1a are: EQU S.sub.i =(A.sub.i@B.sub.i)@ C.sub.i (1) EQU C.sub.i+1 =A.sub.i .multidot.B.sub.i +(A.sub.i @B.sub.i).multidot.C.sub.i (2)
where @ represents the exclusive-or (XOR) function, .multidot. represents the AND function, and + represents the OR function. PA0 The variable P is called "carry propagate" because when P is high, carry-in is propagated to carry-out. The variable G is called "carry generate" because when G is high, a carry-out is generated by the bits being added. PA0 With some minor algebraic manipulation, Eq. (6) can be used to write new equations where the carry bit at each level is dependent only on the addends at each level and the least significant carry bit. The following equations are implemented in the four bit adder shown in FIG. 3: EQU (a) C.sub.1 =A.sub.0 B.sub.0 =G.sub.0 (b) C.sub.2 =G.sub.1 +P.sub.1 C.sub.1 =G.sub.1 +P.sub.1 C.sub.1 (c) C.sub.3 =G.sub.2 +P.sub.2 C.sub.2 =G.sub.2 +P.sub.2 (G.sub.1 +P.sub.1 C.sub.1)=G.sub.2 +P.sub.2 G.sub.1 +P.sub.2 P.sub.1 C.sub.1 (d) C.sub.4 =G.sub.3 +P.sub.3 C.sub.3 =G.sub.3 +P.sub.3 (G.sub.2 +P.sub.2 G.sub.1 +P.sub.2 P.sub.1 C.sub.1)=G.sub.3 +P.sub.3 G.sub.2 +P.sub.3 P.sub.2 G.sub.1 +P.sub.3 P.sub.2 P.sub.1 C.sub.1 (7) PA0 Each G.sub.i and P.sub.i is a function only of A.sub.i and B.sub.i and not of previous carry values, as can be seen in Eqs. 3 and 4. Second, note in Eq. 7b that C.sub.2 is calculated as a function of G.sub.1, P.sub.1, and C.sub.1, and that in Eq. 7c, C.sub.3 is calculated as a function of G.sub.2, P.sub.2 and C.sub.2. But since C.sub.2 has been solved in terms of C.sub.1, C.sub.3 can also be solved in terms of C.sub.1. Attention to Eq. 7d, and the more general Eq. 6 will reveal that each C.sub.i+1 is a function of several G.sub.i 's, P.sub.i 's, and C.sub.1. As can be seen in FIG. 3, the less significant bit is fed into the next significant bit only for the calculation of the sum, not for the calculation of the carry bit. Since each carry bit is a function of several G.sub.i 's, P.sub.i 's, and C.sub.1, each carry bit is not dependent on the carry-out of any but the least significant bit. Thus the carry propagation delay of the look-ahead carry circuit is independent of the number of bits being added. PA0 The circuit in FIG. 6a implements equation (10). Two conditions are satisfied by this circuit. When A and B are not equal, the signal on the carry-in terminal is passed to the carry-out terminal and when A and B are equal, the signal on A is passed to the carry-out terminal. As shown in FIG. 6a, the two single bits being added, A and B, are applied to the two input terminals of XOR gate 51. If A and B are equal, a low output signal from XOR gate 51 turns on pass transistor T1 and turns off pass transistor T2, allowing passage of the signal from A to the carry-out terminal C.sub.out. If A and B are not equal, the output of XOR gate 51 is high, which turns on pass transistor T2 and turns off pass transistor T1. This in turn allows passage of the signal on the carry-in terminal C.sub.in to the carry-out terminal C.sub.out.
Eq. (1) shows that the sum is a function of a carry-in from a less significant bit in addition to the single bits A.sub.i and B.sub.i being added. The ripple carry adder algorithm of Eqs. (1) and (2) shows that the sum for a particular bit cannot be calculated until the carry-out from the previous bit is available. The sum S.sub.i is the output of an XOR gate and cannot be generated until each of its inputs, one of which is the carry-in signal C.sub.i, is available.
Furthermore, the carry-out C.sub.i+1 also cannot be generated until the less significant carry bit C.sub.i is available. Referring now to FIG. 2, the propagation of the carry signal through successive stages of a ripple carry adder will be explained. AND gate 67 in the second adder stage Add.sub.i+1 receives one of its inputs from the output of XOR gate 66 after only 1 gate delay. However, assuming that the carry-in signal C.sub.i is preset (that is, that Add.sub.i is the least significant bit), AND gate 67 could wait an additional 3 gate delays for the effect of A.sub.i and B.sub.i to propagate through gates 61, 62 and 65 before its other input, the carry-out C.sub.i+1 from the less significant bit, has been generated from the carry out of the less significant bit C.sub.i and the less significant bits A.sub.i and B.sub.i to be added. Furthermore, the carry-out C.sub.i+2 of the second bit Add.sub.i+1 is further delayed through 2 more gates after the carry bit C.sub.i+1 has been generated. That is, combining the inputs on A.sub.i+1 and B.sub.i+1 with the carry in signal C.sub.i+1 to generate C.sub.i+2 requires that C.sub.i+1 propagate through AND gate 67 and OR gate 70. Thus, there will not be a valid carry-in signal C.sub.i+2 for input to a third stage until 5 gate delays after the application of the input signals A.sub.i and B.sub.i. Thus, the speed of the conventional ripple carry adder is constrained by the speed of propagation of the carry signal. The propagation delay of a conventional ripple carry adder is 2.sub.n+1 gates where n is the number of stages in the multi-bit adder.
Since addition is the foundation of many other important functions and operations, it has been important to the computer industry to devise faster adder circuits by speeding up the carry propagation time. In general, these methods work by trading component density and complexity for carry propagation speed.
One well-known algorithm which achieves a faster carry propagation speed is called look-ahead carry logic. A circuit for implementing look-ahead carry logic is shown in FIG. 3. Understanding this logic requires the introduction of two new variables: EQU P.sub.i=A.sub.i @B.sub.i (3) EQU G.sub.i =A.sub.i .multidot.B.sub.i (4)
Eqs. (1) and (2) can be rewritten in terms of these new variables: EQU S.sub.i =P.sub.i @C.sub.i (5) EQU C.sub.i+1 =G.sub.i +P.sub.i .multidot.C.sub.i (6)
Referring still to FIG. 3 and FIG. 1a, the delay from the application of the input signals (A's and B's) to the appearance of a valid signal at the generate outputs (G.sub.i 's) and propagate outputs (P.sub.i 's) of an adder stage is 1 gate (this can be discerned from FIG. 1a). The delay added in FIG. 3 by the carry restorer portion of the look ahead carry circuitry is 2 gates, which makes a total of a 3-gate delay from the application of the input signals to the adder until the last carry-out bit is available. This relationship is independent of the number of bits being added. For a multibit adder circuit, the delay will be significantly less than the delay of a conventional ripple carry adder circuit. However, as the number of stages is increased, the number of components increases significantly. Look ahead carry logic requires many more components than the conventional ripple carry adder to implement a stage of a multi-bit adder. This illustrates the idea that faster carry-propagation requires higher component density.
FIG. 4 shows another example of circuit components for implementing an adder. The adder of FIG. 4 is very fast, but, like the adder of FIG. 3, uses many components. Again, a faster carry logic requires a higher component density.
Pages 6-30 through 6-44 of Xilinx, Inc., "The Programmable Gate Array Data Book," copyright 1989, available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124, show a variety-of adders and counters which can be implemented in prior art Xilinx programmable logic devices. These pages of the Xilinx data book are incorporated herein by reference. Xilinx, Inc., owner of the copyright, has no objection to copying these pages but otherwise reserves all copyright rights whatsoever. The adder of FIG. 4 is shown on page 6-30 of the Xilinx data book. FIG. 5 shows a counter, also shown on page 6-34 of the Xilinx data book. FIGS. 4 and 5 thus show applications of arithmetic functions performed in early Xilinx devices. In early Xilinx devices, calculating the sum requires one function generator, and calculating the carry function requires another function generator. Typically, two function generators are incorporated in one logic block of a Xilinx prior art configurable logic array.
Thus, in the adder circuits of both FIG. 4 and FIG. 5, and for other Xilinx prior art adder circuits as well, at least two function generators are required for implementing each stage of an adder or counter.
The truth table in FIG. 6c shows the logical relationships between two single bits that are being added, the carry-in bit, and the carry-out bit. A careful analysis of this truth table has revealed a useful pattern. When A and B are equal (lines 1, 2, 7, and 8), the value of the carry-out C.sub.out bit is the value of A and of B. When A and B are not equal, on the other hand, (lines 3-6), the value of the carry-out C.sub.out bit is the value of the carry-in C.sub.in bit. Two equivalent Boolean logic equations can represent this pattern: EQU C.sub.out =(A@B).multidot.(C.sub.in)+(A@B).multidot.A (10) EQU C.sub.out =(A@B).multidot.(C.sub.in)+(A@B).multidot.B (11)
FIG. 7a shows a full adder. FIGS. 6b and 7b show alternative representations of the circuits of FIGS. 6a and 7a respectively. The inverter and transistors of FIGS. 6a and 7a are represented as a multiplexer M in the illustrations of FIGS. 6b and 7b.
It will now be shown by comparing FIG. 2 and FIG. 7a that the fast carry logic described above provides faster propagation of the carry signal than a conventional ripple carry adder. FIG. 7a shows one stage of a full adder circuit constructed according to the invention. The carry propagation is controlled as discussed above in connection with FIG. 6a. As discussed above and shown in FIG. 2, the propagation delay of a conventional ripple carry adder is 1 AND gate plus 1 OR gate per pair of bits added plus 1 XOR gate. By contrast, as shown in FIG. 7a, the worst-case delay of a circuit according to the invention occurs when one of the input signals, in this case B.sub.i, is propagated to the carry out signal, that is, when the signal propagates through XOR gate 91 plus inverter 92 to turn on the pass-transistor 93. This happens simultaneously for all bits being added. The propagation delay for a carry signal to propagate through a long series of transistors such as transistor 94 adds only minimal time compared to a gate delay for generating the result of an addition. If four full-adder circuits such as shown in FIG. 7a are cascaded, in the worst case the output signal C.sub.out is available after an XOR gate delay plus an inverter delay plus the very small propagation delay through four pass transistors.