The present invention relates generally to floating point operations, and more specifically to floating point multiply accumulators.
Fast floating point mathematical operations have become an important feature in modern electronics. Floating point units are useful in applications such as three-dimensional graphics computations and digital signal processing (DSP). Examples of three-dimensional graphics computation include geometry transformations and perspective transformations. These transformations are performed when the motion of objects is determined by calculating physical equations in response to interactive events instead of replaying prerecorded data.
Many DSP operations, such as finite impulse response (FIR) filters, compute xcexa3(aibi), where i=0 to nxe2x88x921, and ai and bi are both single precision floating point numbers. This type of computation typically employs floating point multiply accumulate (FMAC) units which perform many multiplication operations and add the resulting products to give the final result. In these types of applications, fast FMAC units typically execute multiplies and additions in parallel without pipeline bubbles. One example FMAC unit is described in: Nobuhiro et al., xe2x80x9c2.44-GFLOPS 300-MHz Floating-Point Vector Processing Unit for High-Performance 3-D Graphics Computing,xe2x80x9d IEEE Journal of Solid State Circuits, Vol. 35, No. 7, July 2000.
The Institute of Electrical and Electronic Engineers (IEEE) has published an industry standard for floating point operations in the ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, IEEE, New York, 1985, hereinafter referred to as the xe2x80x9cIEEE standard.xe2x80x9d A typical implementation for a floating point FMAC compliant with the IEEE standard is shown in FIG. 1. FMAC 100 implements a single precision floating point multiply and accumulate instruction xe2x80x9cD=(Axc3x97B)+C,xe2x80x9d as an indivisible operation. As can be seen from FIG. 1, fast floating point multipliers and fast floating point adders are both important ingredients to make a fast FMAC.
Multiplicands A and B are received by multiplier 110, and the product is normalized in post-normalization block 120. Multiplicands A and B are typically in an IEEE standard floating point format, and post-normalization block 120 typically operates on (normalizes) the output of multiplier 110 to make the product conform to the same format. For example, when multiplicands A and B are IEEE standard single precision floating point numbers, post-normalization block 120 operates on the output from multiplier 110 so that adder 130 receives the product as an IEEE standard single precision floating point number.
Adder 130 adds the normalized product from post-normalization block 120 with the output from multiplexer 140. Multiplexer 140 can choose between the number C and the previous sum on node 152. When the previous sum is used, FMAC 100 is performing a multiply-accumulate function. The output of adder 130 is normalized in post-normalization block 150 so that the sum on node 152 is in the standard format discussed above.
Adder 130 and post-normalization block 150 can be xe2x80x9cnon-pipelined,xe2x80x9d which means that an accumulation can be performed in a single clock cycle. When non-pipelined, adder 130 and post-normalization block typically include sufficient logic to limit the frequency at which FMAC 100 can operate, in part because floating point adders typically include circuits for alignment, mantissa addition, rounding, and other complex operations. To increase the frequency of operation, adder 130 and post-normalization block 150 can be xe2x80x9cpipelined,xe2x80x9d which means registers can be included in the data path to store intermediate results. One disadvantage of pipelining is the introduction of pipeline stalls or bubbles, which decrease the effective data rate through FMAC 100.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for fast floating point multiply and accumulate circuits.