One common execution unit in a processor is a fused multiply-add (FMA) unit. In general, a FMA unit can perform an operation on three incoming operands to first multiply two of the operands and then accumulate the product with the third operand. More specifically, an FMA arithmetic logic unit (ALU) is designed to compute A*B+C, where A, B and C are arbitrary values. Typically A is called the multiplier input, B is called the multiplicand input, and C is called the addend input. Most current FMA ALU designs power up and operate at the same power level regardless of the data inputs presented to the FMA ALU. This can cause excessive power consumption, particularly as the multiplication unit of the FMA is a high power consumer.
Some processors use such a unit to perform more simple mathematical operations such as additions, subtractions and multiplications by appropriate selection of the third operand or routing of operands and results via selection circuitry. Accordingly, in many processors a FMA unit may form the backbone of the execution units and may be a key circuit in determining the frequency, power and area of the processor.
Previous solutions to reduce an FMA unit's average power typically focus on reducing power for simpler operations overlaid onto the FMA ALU, often by placing these overlaid operations into a separate floating point ALU that is independent from the FMA ALU. This allows the FMA ALU to power down for these simpler operations, reducing power consumption in these cases. However this is expensive in terms of area and leakage power and hence is not an ideal solution. In addition, this solution cannot save power for FMA instruction execution.