Dot-product calculations are frequently used to calculate the sum of the products of two sets of operands for digital signal processing applications, such as multiplication of complex numbers which is used in, for example, Fast Fourier Transform (FFT) and discrete cosine transform (DCT) butterfly operations. A dot-product calculation involves multiplying two pairs of operands and summing the products to produce a single precision dot-product value. In multiplying complex data the difference of two products is also very useful. Conventional floating-point hardware can perform a dot-product using two floating-point multiplication operations and one floating-point addition or subtraction operation, which operations may be performed serially or in parallel. However, serial execution of the dot-product operation may limit throughput, which may be undesirable in implementations that require rapid calculations. In contrast, while parallel execution using two independent floating-point multipliers followed by a floating-point adder may be fast, the additional multiplier unit is expensive both in terms of silicon area and power consumption.
Embodiments disclosed herein can provide solutions to these and other problems, and offer other advantages over the prior art.