One of the basic functions to the operation of virtually all computer systems is the capability of adding two integers together. Having an addition function is essential because not only is addition used to provide numerical sums to users, but it is also used in the implementation of numerous logic functions internal to the computer system. Hence, one or more adders are typically found in the arithmetic logic unit of a computer's microprocessor.
As such, when two bits are added together, the result is a sum of those two bits plus a carry to the next (i.e., leftward) position. Thus, the addition of multiple bits can be effectuated by using the carry-out of one set of bits as the carry into the neighboring set of bits on its left. For example, the binary addition of the two bits “11” and “01” is performed by first adding together the two least significant bits “1” and “1.” The result is a sum of “0” with a carry-out bit of “1.” The carry-out bit is factored as a carry bit to the addition of the next set of bits, “0” and “1.” The result is a sum of “0” with a carry-out of “1.” This yields a final correct answer of “100” (i.e., 3+1=4).
As known to those skilled in the art, this type of adder is known as a ripple carry adder because the addition function involves rippling the carry, which can be either a 1 or a 0, all the way from the rightmost bit to the leftmost bit. One problem, however, associated with ripple carry and other similar types of adders is that it takes time to ripple the carry signal. In some cases, two levels of logic are implemented in computing the carry-out from a carry-in. Hence, if the least significant bit generates a carry which is propagated through the length of the adder, the signal passes through 2n levels of logic before the last gate can determine whether there is a carry-out of the most significant place.
In general, the time a circuit takes to produce an output is proportional to the maximum number of logic levels through which the signal travels. This propagation delay is specially severe for cases involving the addition of large numbers having multiple bits. For example, a substantial amount of time is required to ripple the carry through the entire addition chain of two 32-bit words. Consequently, the time required to ripple the carry retards the critical time path, thereby slowing down the overall speed of the microprocessor. This detrimentally impacts the performance of the computer system.
Consequently, there are numerous prior art adder designs aimed at minimizing the time required to perform binary addition. One approach involves the manipulation of multiple bits in parallel to speed up the addition process. In a design known as carry look-ahead, the inputs to a number of stages are examined, while simultaneously, the proper carry for each of these stages is being calculated. Each carry is then applied to the adder corresponding to the appropriate bit. In a conditional-sum arrangement, a sum both with a carry set to “0” and a carry set to “1” are generated for each order. A selection of one of the two sums is made based on carry information originating from the lower orders.
Other various hybrid arrangements utilizing combinations of the parallel and propagate carries have been proposed to reduce the inherent time delay. The goal is to further minimize the critical time path. As a result, the trend is for ever more powerful and faster computers, there is a need in the prior art for even faster adders. It would be highly preferable for such an adder to have a regular, hierarchical structure suitable for performing the addition of large numbers. Furthermore, it would also be preferable for the delay of such an adder to scale up logarithmically with an increase in the number of bits to be added.
Moreover, floating point operation is one of the essential tasks repeatedly performed by many digital data processors. As a result, the floating point unit (FPU) is an essential part of a digital data processor. Much effort has been expanded to try to maximize the speed of FPUs, nevertheless, any further improvement on the speed of FPUs is still desirable. During floating point operations, it is often necessary to determine the amount of right shift for a mantissa of a floating point number. The amount of right shift is equal to the difference between the exponents of the two floating point input operands.
Thus, it is often necessary to perform a rapid addition/subtraction operation, and then followed by a shifting operation. Conventional FPU typically includes an adder to perform the addition/subtraction operation, and a separate barrel shifter to perform the shifting operation. Conventional FPU typically also attempts to maximize the performance of the FPU by maximizing the performance of the individual components. Thus, typically a high speed parallel adder and any one of a number of high speed barrel shifter would be employed.
This conventional approach suffers from at least one disadvantage in that high speed parallel adders typically achieve their improvement in performance by focusing on the critical paths. As a result, the generation speed for the lower sum bits are sacrificed in favor of the generation speed of the higher order sum bits that are on the critical paths. Since the shifting operation serially depends on the output sum bits of the adder, the conventional approach actually leads to less than optimal combined performance for the addition and shifting operations when viewed in totality.
Therefore, there remains a need to overcome one or more of the limitations in the above-described, existing art.