Rapid detection and prediction of arithmetic overflow and underflow exceptions are crucial to the performance of advanced microprocessors. These operations typically require a comparison of a single operand, A, with a constant, K, such as by the comparison operation A&gt;=K. These operations more frequently involve comparing the sum of two operands, A and B, with a constant, K, such as by the sum-and-compare operation A+B&gt;=K. The speed of this sum-and-compare operation depends on the speed of the carry bit propagation through an n-bit addition, where n is the number of bits of each operand.
The traditional method for performing a sum-and-compare operation, A+B&gt;=K, employs an adder followed by a subtractor. FIG. 1 is a block diagram functionally illustrating the sum-and-compare circuit for performing this operation. The adder circuit 1 obtains the sum of the operands A and B and outputs the sum to a second adder circuit 2. The adder circuit 2 functions as the subtractor and adds the twos complement, J, of the constant K to the sum output from the adder circuit 1. The most significant carry output bit, Cout, is true if the condition A+B&gt;=K is true.
The performance of the sum-and-compare circuit shown in FIG. 1 is limited by the propagation or "rippling" of carry bits from the least significant bit of the result to the most significant bit of the result. Several adder architectures have been developed for accelerating carry propagation to reduce the propagation delay, such as carry-lookahead architectures, carry-skip architectures and carry-select architectures. These adder architectures are well known in the art and their characteristics are summarized in the following table in terms of propagation delay and area complexity.
Type Delay Area Ripple D(n) A(n) Carry D(log n) A(n log n) Lookahead Carry Skip D(sqrt n) A(n) Carry Select D(sqrt n) A(n)
In the table shown above, the letter D represents the intrinc delay, the letter A represents the die area required for logic needed for one bit of the operation, and the letter n is the number of bits of the adder, commonly referred to as the adder width. As the table indicates, with all of these adder architectures, delay and die area increase as the number of bits of the adder increases. The fastest of these architectures, and the most costly in terms of die area, is the carry-lookahead adder architecture.
A traditional carry-lookahead circuit 5 is shown in FIG. 2. The traditional carry-lookahead circuit 5 has the form of a binary tree comprised of "generate" and "propagate" signals and cells 7, which operate on the generate and propagate signals. The term "binary tree" is used to describe this circuit due to the fact that the number of outputs of each cell 7 is equal to the number of inputs to the cell divided by two. For comparitor applications, it is sufficient to compute only the most significant "generate" output. It is unnecessary to provide additional circuity for low order sum outputs.
The P and G inputs, P.sub.0 and G.sub.0 through P.sub.7 and G.sub.7, are the propagate and generate values, respectively, previously calculated from addends A and B in accordance with the following equations: EQU P=A OR B Equation (1) EQU G=AB Equation (2)
In the interest of brevity, the circuitry for performing these operations is not show. Each cell 7 in the carry-lookahead circuit 5 executes the operations given by the following equations: EQU G.sub.out =G.sub.i OR P.sub.i G.sub.i-1 Equation (3) EQU P.sub.out =P.sub.i P.sub.i-1 Equation (4)
Optimal performance in a CMOS implementation of a sum-and-compare circuit requires that the gate-level granularity of the cells be appropriate to the process technology being used to implement the sum-and-compare circuit. If the gates are too complex, then nonlinear delays associated with the series field effect transistors (FETs) comprising the gates will dominate the critical ting paths. Also, increased complexity of the gates increases die area. On the other hand, if the gates are too simple, then intrinsic inverter delays will dominate the critical timing paths. Therefore, in order to maximize performance of the sum-and-compare circuit without increasing the amount of die area needed to implement the circuit, all of these factors should be taken into consideration.
Accordingly, a need exists for a sum-and-compare circuit which implements logic gates with a gate-level granularity appropriate to the process used to design and fabricate the sum-and-compare circuit, and which balances series FET delays with intrinsic inverter delays so that the propagation delay of the sum-and-compare circuit is minimized.