1. Field of the Invention
The present invention relates to an apparatus and method for performing an addition operation on operands A and B in order to produce a result R, the operands A and B and the result R being floating point values each having a significand and an exponent.
2. Description of the Prior Art
A floating point number can be expressed as follows:±1·x*2y 
where: x=fraction                1.x=significand (also known as the mantissa)        y=exponent        
Floating point addition can take two forms, namely like-signed addition (LSA) or unlike-signed addition (USA). An LSA operation is performed if two floating point operands of the same sign are to be added, or if two floating point operands of different signs are to be subtracted. Similarly, a USA operation is to be performed if two floating point operands of different sign are to be added, or if two floating point operands of the same sign are to be subtracted. When referring in the present application to the addition of floating point operands and the addition of the significands of such operands, this should be taken as collectively referring to LSA or USA computations, and accordingly it will be appreciated that such a term covers both addition and subtraction processes.
Typically a processor will have a number of pipelined units for performing different data processing operations. One such pipelined unit is an adder unit which comprises a number of pipeline stages for performing addition operations. Floating point addition consists of a number of stages, namely (1) exponent analysis and difference computation, (2) operand alignment, (3) addition (which may include rounding injection), and (4) normalization. For many years the state of the art in adder pipelines has involved the provision of two separate paths for at least part of the addition process, one being referred to as the near path and the other being referred to as the far path. In particular such a near/far path split can save a clock cycle in the addition process, based on the observation that nontrivial alignment and nontrivial normalization are mutually exclusive.
Hence, the near path is used for USA operations involving operands whose exponents are equal or differ by one, these operations having the potential to cause some cancellation of leading bits of the significand. Such differences do no require rounding, but they do require normalization after the addition. The far path is then used for all other USA operations and for all LSA operations, and requires circuitry for performing alignment and rounding, but only requires trivial (1-bit) normalization.
Such a split adder pipeline was first published in the PhD thesis “On the Design of High Performance Digital Arithmetic Units, by P Farmwald, University of California Livermore, 1981, and has been refined in several subsequent designs, see for example the paper entitled “1-GHz HAL SPARC64 Dual Floating Point Unit with RAS Features” by A Naini et al, Proceedings of the 15th IEEE Symposium on Computer Architecture, 2001, and also commonly-owned U.S. Pat. No. 7,437,400, the entire contents of which are hereby incorporated by reference.
One common form of operation involving an addition is a multiply-accumulate operation, taking the form A+L*M, where the multiplication result of the operands L and M form the second operand B for the addition. With the publication of the IEEE 754-2008 Standard, fused multiply accumulate (FMA) operations (also referred to herein as fused multiply add operations) have become a requirement for floating point units, an FMA operation requiring the unrounded multiplication result to be added to the operand A, with rounding then performed in association with the output of the addition. Dedicated FMA pipelined units have been developed, but the provision of such a dedicated unit is costly. Typically such an FMA unit is also used to perform standard addition operations and standard multiplication operations, thereby avoiding the need for a separate adder unit and a separate multiplier unit, but due to the complexity of the FMA unit, an FMA unit will typically take longer to perform a standard addition operation than a dedicated adder unit, and will also typically take longer to perform a standard multiplication operation than a separate multiplier unit. Since most operations are not actually FMA operations, but instead involve standard additions and multiplications, this can have a significant performance impact on the floating point unit.
Another mechanism for performing an FMA operation is to use a separate multiplication unit to perform the multiplication of the operands L and M, and then forward the result unrounded to a separate adder unit to form the second operand to be added to the operand A, as for example described in commonly owned co-pending U.S. patent application Ser. No. 12/585,668, the entire contents of which are hereby incorporated by reference. However, when using the above described near/far path architecture for the adder unit, this causes problems in the operation of the near path. In particular, the problem is that the unrounded multiplication result used as one of the operands is twice the length of the result R, and hence can require rounding even if there is cancellation in the near path. As mentioned above a near path does not typically provide rounding circuitry, and the output of the addition needs to be normalized before rounding is performed. As a result it would be necessary to add another pipeline stage to the near path to allow rounding to be performed, and this would significantly impact the performance of the adder unit.
Accordingly, it would be desirable to provide an improved floating point adder unit for a data processing apparatus.