Processors for performing arithmetic operations on binary floating point numbers are known. In floating point representation, numbers are represented using a mantissa 1.F, an exponent E and a sign bit S. The sign bit S represents whether the floating point number is positive or negative, the mantissa 1.F represents the significant digits of the floating point number, and the exponent E represents the position of the radix point (also known as a binary point) relative to the mantissa. By varying the value of the exponent, the radix point can “float” left and right within the mantissa. This means that for a predetermined number of bits, a floating point representation can represent a wider range of numbers than a fixed point representation (in which the radix point has a fixed location within the mantissa). However, the extra range is achieved at the expense of reduced precision since some of the bits are used to store the exponent. Sometimes, a floating point arithmetic operation generates a result with more significant bits than the number of bits used for the mantissa. If this happens then the result is rounded to a value that can be represented using the available number of significant bits.
FIG. 1 of the accompanying drawings shows how floating point numbers are stored within a register or memory. In a single precision representation, 32 bits are used to store the floating point number. One bit is used as the sign bit S, eight bits are used to store the exponent E, and 23 bits are used to store the fractional portion F of the mantissa 1.F. The 23 bits of the fractional portion F, together with an implied bit having a value of one, make up a 24-bit mantissa 1.F. The radix point is initially assumed to be placed between the implied bit and the 23 stored bits of the mantissa. The stored exponent E is biased by a fixed value 127 such that, if E−127 is negative, then in the represented floating point number the radix point is shifted left from its initial position by a number of places equal to the absolute value of E−127 (e.g. if E−127=−2 then a mantissa of 1.01 represents 0.0101), and if E−127 is positive, then the radix point is shifted right from its initial position by E−127 places (e.g. if E−127=2 then a mantissa of 1.01 represents 101). The bias is used to make it simpler to compare exponents of two floating point values as then both negative and positive shifts of the radix point can be represented by a positive value of the stored exponent E. As shown in FIG. 1, the stored representation S[31], E[30:23], F[22:0] represents a number with the value (−1)s*1.F[22:0]*2(E−127). A single-precision floating point number in this form is considered to be “normal”. If a calculated floating point value is not normal (for example, it has been generated with a mantissa having one or more leading zeros), then it can be normalized, for example by shifting the mantissa left and adjusting the exponent accordingly until the number is of the form (−1)s*1.F[22:0]*2E−127 (floating point numbers can also be “denormalised” in order to place them in a “denormal” form 0.F*2−126). Exception handling routines and/or hardware can be provided to handle numbers that cannot be represented as a normal floating point value, such as infinity and not-a-number (NaN) values. A special value of the exponent field E (0xFF) can be reserved for such values.
A double precision format is also provided in which the mantissa and exponent are represented using 64 stored bits. The 64 stored bits include one sign bit, an 11-bit exponent and the 52-bit fractional portion F of a 53-bit mantissa 1.F. In double precision format the exponent E is biased by a value of 1023. Thus, in the double precision format a stored representation S[63], E[62:52], F[51:0] represents a floating point value (−1)s*1.F[51:0]*2E−1023.
Hereafter the present invention shall be explained with reference to the single precision floating point format. However, it will be appreciated that the invention could also be applied to the double precision format (or any other floating point format) and that the numbers of bits shown in subsequent figures could be replaced by values appropriate to the floating point format being used.
A commonly used floating point operation is a multiply add operation A+B*C, whereby two operands B and C are multiplied together and the product B*C is added to a third operand A. The multiply add operation can also be referred to as a multiply accumulate operation, since often the result A+B*C is written to the register that contains the operand A. Thus, a series of multiply add operations can be used to accumulate a sum of various products to a destination register.
In floating point processors complying with the IEEE 754-1985 floating point standard, a multiply add operation can be implemented as separate multiply and add operations, including a rounding of the intermediate result. The 1985 standard has now been superseded by the IEEE 754-2008 standard, which as well as supporting the separate multiply and add operations, also provides for a fused multiply add operation in which there is no rounding of the intermediate result. In the fused multiply add operation the overall operation is performed at higher precision as there is no rounding of the intermediate result.
In single precision floating point, the mantissa resulting from the multiplication part of a fused multiply add operation is 48-bits wide. In order to align this with a 24-bit wide addend A, a shifter which is around 48-bits wide is required. To sum the aligned operands, an adder of at least similar width is required. This is in contrast to floating point units performing only simple floating point addition, in which the adders and shifters are only around 24-bits wide for single precision floating point. The wide adders and shifters that are typically used for the fused multiply add represent a significant area overhead, particularly for smaller processors. The present invention seeks to reduce the area overhead required to perform a fused multiply add operation.