1. Field of the Invention
This invention relates to floating point arithmetic and more particularly to floating point subtraction operations.
2. Description of the Related Art
Floating point subtraction is utilized in numerous applications. Such floating point numbers are typically in the form of 1.f.times.2.sup.E, where 1.f is the significand, f is a fraction and E is an exponent. The floating point numbers discussed herein, by way of example, are assumed to be represented in IEEE 754 single precision format, where the exponent field is eight bits and the fraction field is twenty three bits. The one in the significand is not explicitly represented in the IEEE format. The floating point representation in IEEE 754 format includes a sign bit.
Subtracting two floating point numbers A-B, where for example, A=(1.101.times.2.sup.3) and B=(1.100.times.2.sup.1) requires adjusting B to be in the form of (0.011.times.2.sup.3) prior to the fractions being added so that the exponents of the two numbers are equal. That general approach to floating point subtraction is known in the art and is illustrated in FIG. 1.
Given two floating point numbers A and B, the first step 10 is to compare the exponents of A (expA) and B (expB). If the exponent of A is less than B (step 12), then the fractions of A and B are swapped in step 14 so that the exponent of A is always greater than or equal to the exponent of B. The larger exponent is chosen as the exponent of the result. In step 16, with expB less than or equal to expA, the fraction of B is shifted to the right by the absolute value of the difference between the expA and expB. In step 18 the fraction of B is subtracted from the fraction of A. In step 20, the result of the fractional subtraction is shifted to the left, if necessary, to normalize the result. The exponent is adjusted according to the amount of normalization required.
Note that the shifting operation in step 16 requires at most a one bit shift if .vertline.ExpA-ExpB.vertline..ltoreq.1. In that case, the shifting operation in step 20 might require a large shift to normalize the result, since there are potentially a significant number of leading 0's in the subtraction of the fraction of A (FrcA)--the fraction of B (FrcB). That is due to the operands having exponents which differ by 1 or less. Conversely, if .vertline.ExpA-ExpB.vertline.&gt;1, multiple bit shifting may be required in step 14 to equalize the exponents but there is at most a single bit shift in step 18.
A block diagram of a floating point adder which implements the method described in FIG. 1, is shown in FIG. 2. Operands 201 and 202 are being added. Small adder 203 compares the exponents of the operands. Small adder 203 is so named since the exponents are only eight bits whereas the fractions are 23 bits. The exponent difference is placed in register 204. That exponent difference is provided to control logic 211. Selector 212 selects the larger exponent as the preliminary result exponent and provides the result to incrementer/decrementer 210.
The fractions of operands 201 and 202 are supplied through multiplexers 205 and 206 to the right shifter 208. The multiplexers 205 and 206 are under control of control logic 211 to supply the fraction of the operand with the smaller exponent to right shifter 208. The difference between the exponents determines the amount to right shift (if at all). The right shifted fraction in multiplexer 208 and the fraction from 205 are supplied to big adder 207. Big adder 207 is so named because the size of the fraction operands are typically large. Following the subtraction operation, the result may need to be shifted in shifter 209. If the result from big adder 207 is shifted to normalize, the exponent is decremented accordingly. Finally, rounding hardware 213 performs the appropriate rounding operations defined, e.g., in IEEE 754 standard and provides the result 215.
As discussed with relation to FIG. 1, the subtract operation will only require a right shift in shifter 208 or 209, excluding, one bit shifts, but not both. It has been recognized that an adder that can exploit this advantage by reducing the number of steps required to FIG. 1 by eliminating the unnecessary shift. It has also been recognized that a pipelined adder can duplicate the shifter and adder to take advantage of the fact that only one major shift operation is necessary. Such proposals however, still require the completion of step 10, the exponent compare, prior to a subtraction or shift. It would therefore be advantageous to provide a floating point adder that more fully exploits the fact that only one large shift operation is required and thus further speed up floating point subtratction.