Modern processors include various circuitry for performing operations on data. Typically, a processor is designed according to a given instruction set architecture (ISA). Many processors have a pipelined design that can be implemented as an in-order or out-of-order processor.
In either event, instructions are obtained via front end units, which process the instructions and place them in a form to be recognized by further components of the pipeline. Typically, so-called macro-instructions are broken up into one or more micro-instructions or uops. These uops may then be executed in different execution units of a processor. That is, many processors include multiple execution units including arithmetic logic units, address generation units, floating-point units and so forth.
One common execution unit is a multiply-accumulate unit, which may be in the form of a fused floating-point multiply-accumulate (FPMAC) unit. In general, a MAC unit can perform an operation on three incoming operands to first multiply two of the operands and then accumulate the product with the third operand. Some processors use such a unit to perform more simple mathematical operations such as additions, subtractions and multiplications by appropriate selection of the third operand. Accordingly, in many processors a MAC unit may form the backbone of the execution units and may be a key circuit in determining the frequency, power and area of the processor. In addition, MAC units can be heavily used in certain applications such as graphics and many scientific and engineering applications. Thus these units should be made to be as efficient in area, power consumption, and processing speed as possible.