Modern processors include various circuitry for performing operations on data. Typically, a processor is designed according to a given instruction set architecture (ISA). Many processors have a pipelined design that can be implemented as an in-order or out-of-order processor.
In either event, instructions are obtained via front end units, which process the instructions and place them in a form to be recognized by further components of the pipeline. Typically, so-called macro-instructions are broken up into one or more micro-instructions or uops. These uops may then be executed in different execution units of a processor. That is, many processors include multiple execution units including arithmetic logic units, address generation units, floating-point units and so forth.
One common execution unit is a multiply-add unit, which may be in the form of a fused multiply-add (FMA) unit. In general, a FMA unit can perform an operation on three incoming operands to first multiply two of the operands and then accumulate the product with the third operand. Some processors use such a unit to perform more simple mathematical operations such as additions, subtractions and multiplications by appropriate selection of the third operand or routing of operands and results via selection circuitry. Accordingly, in many processors a FMA unit may form the backbone of the execution units and may be a key circuit in determining the frequency, power and area of the processor. In addition, FMA units can be heavily used in certain applications such as graphics and many scientific and engineering applications. Thus these units should be made to be as efficient in area, power consumption, and processing speed as possible.