The present invention generally relates to a method and apparatus for performing a sum-and-compare operation and, more particularly, to a high-speed sum-and-compare circuit which reduces delay caused by carry propagation and which can be implemented in a relatively small amount of die area.
Rapid detection and prediction of arithmetic overflow and underflow exceptions are crucial to the performance of advanced microprocessors. These operations typically require a comparison of a single operand, A, with a constant, k, such as by the comparison operation A=k. These operations more frequently involve comparing the sum of more than one operand (e.g., A+B+C) with one or more constants (e.g., k1, k2), such as by the sum-and-compare operation A+B+C={k1, k2, k3 . . . , kn}. The speed of this sum-and-compare operation depends on the speed of the carry bit propagation through an n-bit addition, where n is the number of bits of each operand.
The traditional approach employs two four-input sum and compare logic blocks to detect the condition A+B+C={k1, k2, k3 . . . , kn}. The two blocks are configured with logic to perform the respective functions A+B+C less than kn and A+B+C greater than k1, where {k1, k2, k3 . . . , Kn} are contiguous ascending integers. Each sum and compare block is implemented with a four-input adder. For example, the operation A+B+C greater than k1 is performed as an overflow operation (A+B+Cxe2x88x92(k1+1)).
In order to enhance performance, it is known to implement the four-input adder with two three-input Carry Save Adders (CSAs) and a single Carry Propagate Adder (CPA). Each CSA is an m-bit full adder circuit with three input addends (i.e., A, B and C) and two outputs, namely, a sum and a carry. The CSAs are cascaded such that the output (i.e., sum and carry) of the first CSA is the input to the second CSA. The second CSA also receives as its input xe2x88x92(k1+1). The CPA is an m-bit full adder circuit, which propagates its carry bits internally. The CPA receives as its input the carry and sum outputs from the second CSA. Only the most significant carry output of the CPA, which represents the overflow bit of the CPA, needs to be evaluated to determine the result of the sum-and-compare operation.
This latter approach has some disadvantages in terms of the amount of die area needed for implementation of the circuit and in terms of performance. Only the first CSA block may be shared by both of the compare functions, A+B+C less than kn and A+B+C greater than k1. Thus, a total of three CSA blocks and two CPA blocks are needed to realize the function A+B+C={k1, k2, k3 . . . , Kn}, which increases the amount of die area needed to implement the logic. The performance of this circuit is limited by the speed of the n-bit carry propagate operations in the CPA block.
An article entitled xe2x80x9cEvaluating xe2x80x98A+B=Kxe2x80x99 Conditions in Constant Time,xe2x80x9d in PROCEEDINGS, IEEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, Vol. 1, 1988, pp. 243-6, by Cortadella and Llaberia, proposes a scheme for evaluating a two-input sum and compare operation, A+B=K, without carry propagation. A proposal was made to simplify this scheme in an article entitled xe2x80x9cEarly Zero Detection,xe2x80x9d in PROCEEDINGS, 1996 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 1996. Both of these articles state that the carry output of a single-bit adder may be expressed independent of the carry input, by Lutz and Jayasimha, if the sum output is known as follows:
If S=K, then Cout=NOT(K)P OR G,xe2x80x83xe2x80x83(Equation 1)
where S is the sum of A+B, Cout is the carry of A+B, P is (A XOR B), G is AB, and K is the known constant bit. A simple proof of this observation follows:
If S=K, then K=P XOR Cin,xe2x80x83xe2x80x83(Equation 2)
Cin=P XOR K,xe2x80x83xe2x80x83(Equation 3)
Cout=Pcin OR G,xe2x80x83xe2x80x83(Equation 4)
Cout=P(P XOR K) OR G,xe2x80x83xe2x80x83(Equation 5)
Cout=P(NOT(K)P OR NOT(p)k) OR G,xe2x80x83xe2x80x83(Equation 6)
Cout=NOT(K)P OR G,xe2x80x83xe2x80x83(Equation 7)
The customary sum equation:
S=A XOR B XOR Cin,xe2x80x83xe2x80x83(Equation 8)
is used to detect equality. It will equal zero if A+B=K.
Although the above-noted articles make a very important observation, neither of these articles nor any of the aforementioned techniques teach disclose or propose a solution for simultaneously comparing a sum of operands with multiple constants. Furthermore, none of these articles or any of the aforementioned techniques teach or suggest programming the constant k in a metal mask, which allows a single circuit design to be used for multiple values of k.
Accordingly, a need exists for a sum-and-compare circuit that exploits the observation that the carry output may be expressed independently of the carry input, and which can be implemented in a relatively small amount of die area. A need also exists for a sum-and-compare circuit which is capable of simultaneously comparing a sum with multiple constants, which allows the constants to be programmed in a metal mask, and which exhibits substantial performance improvements over existing sum-and-compare circuits.
A method and apparatus is provided for performing a fast sum-and-compare operation. The apparatus of the present invention utilizes a single carry save adder in conjunction with a zero detect circuit for performing logic operations to determine whether or not the sum of a plurality of operands is equal to one or more constants. The carry save adder generates a sum, M, and carry, L, that are output from the carry save adder to the zero detect circuit. The zero detect circuit produces internal carry signals that are passed between adjacent bit-cells of the zero detect circuit. The zero detect circuit generates outputs Zk1 through Zkn which are true if the condition A+B+C={k1, k2, k3 . . . kn} for all constants k1 through kn.
The carry signals propagate through only one bit of the zero detector circuit, thereby providing the sum-and-compare circuit of the present invention with extremely high speed. The zero detect circuit performs the logic operations represented by Zk=(L XOR M XOR CIN) XNOR k, where Zk is one of the outputs Zk1 through Zkn, k is one of the constants k1 through kn and CTN is the carry input to the zero detector bit cell. The output Zk will be equal to zero if A+B+C=K. Therefore, if any of the zero detector bit cells encounters a mismatch between the sum and the constant, the output Zk for that particular bit cell will indicate that a mismatch has occurred.
Preferably, the logic of the carry sum adder and of the zero detector is implemented in dynamic domino logic which further increases speed of the sum-and-compare circuit. By utilizing a single Carry Save Adder in conjunction with the zero detector circuit of the present invention, the amount of die area required for implementation of the sum-and-compare circuit is considerably less than known sum-and-compare circuits which utilize multiple Carry Save Adders.
These and other features and advantages of the present invention will become apparent from the following description, drawings and claims.