In general-purpose microprocessors and DSP applications that utilize FIR filters, the summation Σa1bi often must be calculated where i=0 to n−1 and where al and bi are both single precision floating point numbers. Such calculations often demand the use of fast floating point multiply accumulate units (FMAC). FMAC units essentially multiply two numbers and accumulate the products to give the final result. In designing FMACs, designers have attempted to improve the performance of the main components of the FMACs—the multiplier and the accumulator—by increasing the speed at which these units operate and by reducing the cost to implement these components. Prior implementations of FMACs typically have required placement of expensive variable shifters in the circuit path or circuit “loop”. The circuit “loop” implements a sequence of actions (such as shifting mantissa bits) in order to perform mantissa addition. Prior single precision FMAC implementations typically have required 8-bit subtractors in the exponent path (the exponent path is responsible for computing the result exponent. Floating-point accumulation involves addition and a variety of other steps, including exponent alignment, addition of two mantissa(s), normalization (or shifting) of the resulting sum, and rounding of the sum. There exists a need for a FMAC architecture and algorithm that can achieve faster floating-point accumulation when compared to previous implementations for a given precision level. There also exists a need for FMAC implementations that utilize lower cost components than previous implementations. The present invention includes a new architecture and algorithm which enable much faster floating-point accumulation operations than is possible in prior implementations.