1. Field of the Invention
This disclosure relates generally to computer processors, and in particular to a system and method for executing a multiply-add operation in a multiply-add pipeline utilizing an unrounded result from a prior operation.
2. Description of the Related Art
Processors may include one or more specialized multiply and add execution pipelines to perform multiply and add instructions. A common metric used to measure the performance of a multiply-add pipeline is the latency required to complete the execution of a multiply-add instruction. As many instructions may be executed in succession, with the result of one operation fed back as an input of the next operation, the latency of the pipeline may have a major impact on the time required to complete a large sequence of operations.
One way to increase the performance of the multiply-add pipeline is by reducing the latency of the pipeline. One technique which may be used to reduce the latency is to bypass the unrounded result of an operation to the input operands for use in the next instruction. A typical multiply-add pipeline executes an instruction, and then after the preliminary, unrounded result has been calculated, the pipeline may determine if rounding is required, and if so, perform the rounding. Then, the rounded result may be routed back to the input operands of the pipeline. The rounding stage of the multiply-add pipeline may add one or more extra stages of delay to the pipeline.
Therefore, what is needed is a way to bypass the unrounded, intermediate result to the input operands of the pipeline and compensate for the lack of rounding if during a subsequent operation it is determined that rounding is required on the intermediate result. In addition, it would be preferable to utilize the existing resources and architecture of the multiply-add pipeline as much as possible while implementing the rounding compensation technique.
In view of the above, improved methods and apparatus for executing a multiply-add operation on a bypassed, unrounded result in a multiply-add pipeline are desired.