Digital signal processing (DSP) algorithms are often defined using floating point numbers. The principle element of most DSP algorithms is the multiply-accumulate, where two numbers are multiplied, then added to an accumulating result. A common DSP algorithm is the FIR (finite impulse response) filter algorithm, where almost all operations are multiply-accumulate operations. These algorithms can be implemented in floating point (numbers expressed as a mantissa and an exponent) or fixed point numbers (numbers represented in integer or fractional notation).
Although floating point numbers are considered easier to work with by simplifying program coding and giving more accurate results compared to fixed point numbers, DSP algorithms are often implemented using fixed point numbers in fixed point hardware, because floating point multiply-accumulator hardware is usually slower than fixed point. FIG. 1 illustrates a block diagram of a typical, prior art fixed point multiply-accumulator (MAC) unit. As shown, the MAC unit includes a multiplier 10 that multiplies sample data values 12 (Xo) and coefficient values 14 (Ai). An adder 16 sums an accumulated value from an accumulator register 18 with the output of the multiplier 10 to provide the result.
Floating point DSP hardware is complicated and slow because of the floating point add operation on the output of the multiplier and the contents of the accumulator. Commonly, the two floating point numbers to be added are compared, and one of the two numbers is shifted to align the decimal points before the add. In hardware, the comparison occurs by subtracting the exponents of the two numbers. The result of this subtraction defines which number will be right shifted and how many positions it will be shifted. Since a decision must be made just before the add occurs, the add operation cannot be pipelined, thus eliminating the ability to do a multiply accumulate on each clock. In contrast, in fixed point MACs, no shift decision is involved, thus allowing a multiply-accumulate to occur on each clock. This difference results in the floating point MAC running at about half the speed (or less) than the fixed point MAC.
Accordingly, what is needed is a system and method for a floating point MAC that can do a multiply accumulate on each clock. The present invention addresses such a need.