1. Field of the Invention
The present invention relates to a data processing apparatus and methods for performing floating point multiplication, and in particular to a data processing apparatus and method for multiplying first and second n-bit significands of first and second floating point operands to produce an n-bit result.
2. Description of the Prior Art
A floating point number can be expresses as follows:
±1.x*2y 
where: x=fraction                1.x=significand (also known as the mantissa)        y=exponent        
Floating-point multiplication consists of several steps:    1. Evaluating the input operands for special cases (in particular NaNs (Not-a-Number cases), infinities, zeros, and in some implementations subnormals). If a special case is detected, other processing may be required in place of the sequence below.    2. Adding the exponents. The product exponent is the sum of the multiplicand and multiplier exponents. The product exponent is checked for out of range conditions. If the exponent is out of range, a resulted is forced, and the sequence of steps below is not necessary.    3. The fractions are converted to significands. If the input operand was normal (as opposed to NaN, infinity, zero or subnormal) a leading ‘1’ is prepended to the fraction to make the significand. If the input operand is subnormal, a ‘0’ is prepended instead. Note that in alternative systems the subnormal operand may instead be normalized in an operand space larger than the input precision. For example, single-precision numbers have 8 bits of exponent, but an internal precision may choose to have 9 or more bits for the exponent, allowing single-precision subnormal operands to be normalized in such a system.
4. The n-bit significands are multiplied to produce a redundant set of 2n-bit vectors representing the 2n-bit product. This is typically done in an array of small adders and compressors.
5. The two 2n-bit vectors are summed to form a non-redundant final product of 2n-bits in length.
6. This final product is evaluated for rounding. The final result may only be n-bits. The lower bits contribute only to the rounding computation. If the computed product has the most significant bit set it is said to have ‘overflowed’ the significand. In this case, as illustrated in FIG. 1, the upper n-bits representing the product begin with the most significant bit, whilst the lower n-bits are used in the rounding computation. If the most significant bit of the product is not set, the resulting product (represented by bits 2n−2 to n−1) is considered ‘normal’ and the n−1 least significant bits (bits 0 to n−2) contribute to rounding.
7. The n-bits of the final product are selected. If the computed product has overflowed, bits [2n−1:n] are selected, whilst if the computed product is normal, bits [2n−2:n−1] are selected. The rounding bits corresponding to the normal or overflowed product are evaluated and a decision is made as to whether it is necessary to increment the final product.
8. If the final n-bit product is to be incremented, a ‘1’ is added to the final product at the least significant point (i.e. bit 0 of the final product).
9. The rounded final product is evaluated for overflow. This condition occurs when the final product was composed of all ones, and the rounding increment caused the final product to generate a carry into bit n (i.e. a bit position immediately to the left of the most significant bit (bit n−1) of the final product), effectively overflowing the n-bits of the result, and requiring a single bit shift right and an increment of the exponent.
The above series of steps are inherently serial, but can be parallelised at several points. For example, it would be desirable to seek to perform the rounding evaluation and any necessary rounding increment without having to first wait for the final product to be produced.
U.S. Pat. No. 6,366,942-B1 describes a technique for rounding floating point results in a digital processing system. The apparatus accepts two floating point numbers as operands in order to perform addition, and includes a rounding adder circuit which can accept the operands and a rounding increment bit at various bit positions. The circuit uses full adders at required bit positions to accommodate a bit from each operand and the rounding bit. Since the proper position in which the rounding bit should be injected into the addition may be unknown at the start, respective low and high increment bit addition circuits are provided to compute a result for both the low and a high increment rounding bit condition. The final result is selected based upon the most significant bit of the low increment-rounding bit result. The low and high increment bit addition circuits can share a high order bit addition circuit for those high order bits where a rounding increment is not required, with this single high order bit addition circuit including half adders coupled in sequence, with one half adder per high order bit position of the first and second operands.
Hence, it can be seen that U.S. Pat. No. 6,366,942-B1 teaches a technique which enables the rounding process to be performed before the final product is produced, but in order to do this requires the use of full adders (i.e. adders that take three input bits and produce at their output a carry and a sum bit) at any bit positions where a rounding bit is to be injected.
Full adders typically take twice as long to generate output carry and sum bits as do half adders. As there is a general desire to perform data processing operations more and more quickly, this tends to lead to a reduction in the clock period (also referred to herein as the cycle time) within the data processing apparatus. As the cycle time reduces, the delays incurred through the use of the full adders described above are likely to become unacceptable.