Floating-point operations are widely used for advanced applications, such as 3D graphics, signal processing and scientific computations. These computations require a wide dynamic range of values. Fixed-point arithmetic is not sufficient for representing such a wide range of values, but floating-point arithmetic, such as that which is specified in the IEEE-754 standard for floating-point arithmetic, can represent a wide range of numbers from tiny fractional numbers to nearly infinitely huge numbers so that overflow and underflow are avoided. However, the floating-point operations require complex processes, such as alignment, normalization and rounding, which significantly increases the area, power consumption and latency. One solution is to merge or “fuse” several operations in one floating-point unit to reduce the area, power and latency by sharing the common logic of the operations. In order to improve the floating-point units, several fused units have been introduced: fused multiply-add, fused add-subtract, and fused dot product.
Unfortunately, despite these improvements to the floating-point units, such as the fused dot product unit, the current floating-point dot product unit is still expensive in terms of silicon area, power consumption and latency.