1. Field of the Invention
Embodiments of the present invention relate generally to graphics processing and, more specifically, to fused floating-point multiply-add operations using a multi-step approach to data shifting.
2. Description of the Related Art
In computer systems, in general, and in graphics processing units (GPUs), in particular, 32-bit floating point arithmetic operations are performed frequently. A floating point number is one in which the decimal point can occur at any location in the string of digits. A fused floating-point multiply-add (FFMA) operation is one that accepts three inputs, A, B, and C, as operands; the A and B operands are multiplied together, and the resulting product is added to the C operand. As the AB multiplication yields a result that is much wider than the original operands, the exponents must be examined and the C operand typically shifted right or left to align the decimal points. The AB product is held static while the C operand is shifted in the appropriate direction to perform the final addition that completes the operation. This shifter must be sized to accommodate the case where the relative exponent values dictate that the C operand be shifted fully to the left of the AB product as well as the case where the relative exponent values dictate that the C operand must be shifted fully to the right. Thus, the resulting shifter must be more than four times the width of the operands.
One drawback to the above approach is that, in the event that the FFMA element is used for a multiply-only operation, the C operand must be set to zero. Setting the C operand to zero necessitates overwriting all of the registers of the previously discussed wide shifter to fully flush out any prior value, entailing the associated power consumption. This drawback is particularly disadvantageous in the event that full FFMA operations alternate with multiply-only operations, as considerable power loss is incurred to simply ensure that a zero is added in the multiply-only mode.
Accordingly, what is needed in the art is a more efficient technique for performing FFMA and multiply-only operations that alternate with one another.