1. Technical Field
The present invention relates to data processing systems in general, and, in particular, to a fast fused-multiply-add pipeline within a data processing system.
2. Description of Related Art
The “IEEE-754 Standard for Binary Floating point Arithmetic” specifies a floating-point data architecture that is commonly implemented in computer hardware, such as floating-point processors having multipliers. The format consists of a sign, an unsigned biased exponent, and a significand. The sign bit is a single bit and is represented by an “S.” The unsigned biased exponent, represented by an “e,” is 8 bits long for single precision and 11 bits long for double-precision. The significand is 24 bits long for single precision and 53 bits long for double-precision. As defined by the IEEE-754 standard, the most significant bit of the significand, the implicit bit is decoded out of the exponent bits.
To improve floating-point arithmetic processing, most modern processors use a process called the fused-multiply-add (FMA) process to combine a floating-point multiplication operation and a floating-point addition operation for execution as a single instruction, e.g., (A×C)+B, where A, B and C are operands. By performing two operations in a single instruction, the FMA reduces overall execution time and hardware costs. The FMA also provides improved precision because rounding need only be performed after both the multiplication and addition operations are performed at full precision (i.e., there is only one rounding error instead of two).
In floating-point processors, one central area is the multiplier array. The multiplier array is used to do multiplication of two numbers. Usually Booth's encoding, which is a commonly used fast multiplication algorithm, with radix 4 is used. This reduces the number of product terms that need to be summed up to n/2+1, where n being the number of bits per operand. The summation is done using a carry-save-adder circuitry which allows processing of all bits in parallel (as opposed to the normal addition where the carry-out of the lower bit position is chained to the next higher position, which is performed usually by a carry-propagate-adder circuitry). The circuitry that does this summation is called reduction tree. At the end of the reduction tree there remain two terms, the sum term and the carry term, which represent a summation part of information and a carry part of information, respectively. These finally are added with the aligned addend. Again, here a carry-save-addition is done. Finally, only two terms, also a sum and a carry term, remain, these two must be added using the carry-propagate-adder to generate one final result.
United States patent application US 2011/0231460 describes a method for processing an FMA operation involving an addend, a first multiplicand, and a second multiplicand. The method focuses on calculating an alignment shift count for the addend input and aligning the addend input based on the alignment shift count, before adding it to the product of the first multiplicand input and the second multiplicand. At the end the sum of this addition process is normalized, rounded and complement-adjusted to deliver the final result of the FMA process.
The present disclosure provides an improved method and apparatus for operating a fast fused-multiply-add pipeline.