A fused-multiply-add (FMA) is a floating-point multiply-add operation performed in one step, with a single rounding. That is, where an unfused multiply-add would compute the product a*b, round it to N significant bits, add the result to c, and round back to N significant bits, a FMA would compute the entire sum a*b+c to its full precision before rounding the final result down to N significant bits. A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products. The FMA can usually be relied on to give more accurate results. A useful benefit of including this instruction is that it allows an efficient software implementation of division (see division algorithm) and square root (see methods of computing square roots) operations, thus eliminating the need for dedicated hardware for those operations.
For many applications, such as radar or other applications, conventional FMA designs cannot meet low latency or low power requirements of modern applications. Accordingly, what is needed is a FMA which meets low latency and low power requirements. The FMA must be low cost, easy to implement and adaptable to existing environments. The present invention addresses such a need.