Floating point numbers are capable of representing a much larger dynamic range of values than fixed point numbers. Accordingly, floating point arithmetic has found use in modern processors. The IEEE 754 standard provides a standardized format for representing binary floating point numbers. Implementations of floating point arithmetic in conformance with the standard involve certain recognized exceptional and problematic corner cases. Conventionally software traps are implemented to handle these corner cases. However, handling exceptions and implementing traps in software is time consuming and taxing on processor resources.
In the case of division using the Newton-Raphson approach, such problematic cases include underflows, wherein the final quotient value is too small to be represented in the IEEE 754 standard using the assigned number of bits; overflows, wherein the final quotient value is too large to be represented in the IEEE 754 standard using the assigned number of bits; insufficient precision due to situations like underflows and overflows of intermediate results; and significand values which do not lend themselves well to reciprocal refinement. Other problematic cases involve division by zero, operand values (numerator/denominator) that are infinity or not-a-number (NaN), etc. Problems of a similar nature arise in square root computations as well.
The above-referenced co-pending application describes techniques for efficiently handling such problematic corner cases. As described therein, floating point numbers which may generate exceptional conditions and problematic corner cases are recognized early on and specialized instructions are defined for fixing up the computations performed on such floating point numbers. By fixing up the computations in this manner, the floating point operations are guaranteed to generate results which are free of problems. For example by applying these fixes to computations using floating point numbers which are recognized to be present in a region of the number space that will give rise to one or more of the above problematic cases, the computations can be guaranteed to be problem-free. One common computation in Newton-Raphson floating point division/square root is a multiply-accumulate (MAC) or fused multiply-accumulate (FMA) computation, wherein an addend operand is added to/subtracted from the product of a multiplier and multiplicand operands. A specialized instruction defined as a fused multiply-accumulate with scaling (or “FMASc”) is defined in the co-pending application for fixing up FMA computations which may result in overflows/underflows, etc. Essentially, the FMASc instruction can be mathematically represented as [(Rs*Rt)±Rx]*2N where Rs, Rt, and Rx are floating point numbers on which the FMA is performed and N can be a positive or negative fixed point number forming the scaling factor.
Implementing the FMASc instruction in hardware using a conventional floating point processor would entail first performing the FMA computation and then applying the scaling factor to it. However, as already described, the FMA itself may overflow/underflow or result in a subnormal result, and therefore, staging the FMA and scaling operations as such may not achieve the objective of handling the problematic corner cases. Even if the FMA itself does not overflow or underflow, the subsequent scaling in a staged implementation may lead to undesired results.
Some of the drawbacks associated with staging the FMA and scaling operations are illustrated in FIG. 1. As shown, the FMA operation is computed at block 102. The result is checked for overflow in block 104. If an overflow occurs, and the scaling factor is for scaling up (e.g. N is positive), and the final result still overflows at block 132, then the scaling would not have introduced an overflow. However, the overflow will still need to be handled. Similarly, if an overflow occurs, and if the scaling factor is for scaling down (e.g. N is negative) at block 124, and the final result still overflows at block 130, then the scaling does not introduce additional problems, as shown by block 134. However, if the final result after scaling down is normal (block 126), then there was a loss of bits, and the final result is not accurate (block 128).
On the other hand, if there was no overflow in block 104, and the result of the FMA was subnormal (i.e. cannot be represented in the standard IEEE 754 notation) in block 106, then scaling down in block 108 may not be problematic (block 112), while scaling up in block 110 will cause loss of bits and accuracy (block 114). If the result of the FMA is not subnormal in block 106, and upon scaling down in block 118, the final result was normal or subnormal or zero in block 120, then there are no additional problems introduced in block 122. Similarly, if the result of the FMA is not subnormal in block 106, and scaling up causes the final result to overflow or be normal block 116, then no additional problems are introduced in block 122 either.
To summarize, it can be seen that there are at least the two conditions in blocks 114 and 128 wherein the scaling factor may itself introduce additional problems, when the FMASc instruction is executed conventionally as a sequentially staged FMA operation followed by scaling.
Accordingly, there is a need in the art for hardware configured to avoid the aforementioned and additional drawbacks associated with implementing the FMASc instruction.