1. Field of the Invention
The present invention relates to computer processors, and, more particularly, to the processing of fused mathematical functions.
2. Description of the Related Art
One of the more common applications of floating-point units is in performing matrix operations. In digital signal processing applications (audio processing, graphics, simulation, and the like), a frequent matrix operation is multiplying a matrix by another a matrix (or vector), which is fundamentally the computation of an inner product, x1 y1+x2 y2+ . . . +xn yn. As can be seen, computing these inner products requires a series of multiply-add combinations.
Motivated by this need, a single instruction that computes (A*B)+C may be devised. This instruction is known as the fused multiply-add, and has a counterpart, the fused multiply-subtract (which, as might be expected, computes (A*B)xe2x88x92C; this may be viewed in the alternative as adding a negative number to the multiplication""s result). For the sake of simplicity, these instructions are referred to herein as a fused multiply-add instruction. Although executing such an instruction requires the ability to read three operands at a time, such an instruction has the potential for improving the performance of computations involving inner products.
The fusing of the multiply instruction with the add (or subtract) instruction provides two main advantages. First, by combining the multiply and add (or subtract) instructions, the result can be computed more quickly. This results from a shorter instruction datapath, for example, as a result of one instruction being used instead of two. Second, there need only be one rounding operation performed. Moreover, the fused multiply-add instruction computes (A*B)+C exactly, rounding only after all the calculations have been completed. This reduction in rounding increases the accuracy of inner products.
In one embodiment of the present invention, a method of executing a fused instruction is disclosed. The method begins by performing several actions, which may be performed serially or in parallel. These include performing a floating point multiplication of a first floating point number by a second floating point number and normalizing a third floating point number. The floating point multiplication of the first and second floating point numbers generates a first result, while the normalization generates a second result. The first result is then added to the second result, generating an unnormalized result. A determination is also made as to whether a large exponent difference exists between the first result and the second result. If a large exponent difference exists between the first result and the second result, a large exponent difference normalization is performed on the unnormalized result. Otherwise, a small exponent difference normalization is performed on the unnormalized result.
In one aspect of the embodiment, the normalizing performed includes determining a first exponent and a second exponent, and normalizing the third floating point number such that an exponent of the second result is equal to a sum of the first exponent and the second exponent. In this aspect, the first exponent is an exponent of the first floating point number and the second exponent is an exponent of the second floating point number
In another aspect of the embodiment, the large exponent difference normalization includes shifting of the unnormalized result and adjustment of an exponent of the second result. Specifically, the unnormalized result""s mantissa is right-shifted, thus generating a right-shifted mantissa. This right-shifted mantissa is left-shifted by one bit position if the right-shifted mantissa is less than 01b, and is right-shifted by one bit position if the right-shifted mantissa is not less than 10b. The exponent of the second result is then adjusted to account for the right-shifting of the mantissa, the left-shifting the right-shifted mantissa (if performed), and the right-shifting the right-shifted mantissa (if performed).
In another aspect of the embodiment, the small exponent difference includes performing a normalization left shift operation, shifting of the mantissa of the unnormalized result and adjustment of an exponent of the unnormalized result. Specifically, a normalization left shift operation is performed on the unnormalized result""s mantissa, if the mantissa is less than 0001b. The mantissa is shifted right one bit position, if the mantissa is not less than 0010b and is less than 0100b; the mantissa is shifted right two bit positions, if the mantissa is not less than 0100b and is less than 1000b; and the mantissa is shifted right three bit positions, if the mantissa is not less than 0010b. Finally, the unnormalized result""s exponent is adjusted to account for any shifting performed.
In another embodiment of the present invention, a fused instruction datapath is disclosed. The fused instruction datapath includes a normalization unit, a floating point multiplier, coupled to provide an unnormalized result to the normalization unit and a mantissa alignment unit, the mantissa alignment unit coupled to provide an aligned mantissa to the floating point multiplier. Indeed, such a datapath is intended to be used as a part of a processor architecture. The mantissa alignment unit may include, for example, a mantissa alignment shifter and a mantissa alignment control circuit coupled to the mantissa alignment Shifter Assuming that the floating point multiplier is configured to multiply a first input number and a second input number, the mantissa alignment control circuit can be configured to cause the mantissa alignment shifter to shift a mantissa of a third input number by a number of bit positions equal to a difference between an exponent of the third input number and a sum of an exponent of the first input number and an exponent of the second input number. The floating point multiplier may include, for example, a number of adders, with each of the adders being coupled to another of the adders, and a final adder, coupled to the mantissa alignment unit and at least one of the adders. The adders can be configured in any number of ways to implement a multiplier, such as an array multiplier or a tree multiplier. The adders may be, for example, carry-save adders, and the final adder may be, for example, a carry-propagate adder.
In one aspect of this embodiment, the normalization unit includes only a bi-directional shifter (coupled to the floating point multiplier to receive the unnormalized result) and a normalization control unit coupled to control the bi-directional shifter. In such a case, the normalization control unit is preferably configured to cause the bi-directional shifter to shift the unnormalized result into a normalized format. Alternatively, a right shifter, a left shifter and a normalization control unit may be provided to support such functionality. In such a case, the right shifter and left shifter are coupled to the floating point multiplier to receive the unnormalized result. The normalization control unit is coupled to control the left and right shifters, such that the right and left shifters shift the unnormalized result into a normalized format. This aspect may be extended by the addition of a one-bit right shifter, a two-bit right shifter and a three-bit right shifter, all coupled to the right shifter. A multiplexer coupled to the shifters is then used to select the output having the proper form (i.e., the output that is properly normalized).
In another aspect of this embodiment, the normalization unit includes a right, a left shifter and a multiplexer, as well as a first one-bit right shifter, a two-bit right shifter, a three-bit right shifter, all coupled to the right shifter, and a second one-bit right shifter and a one-bit left shifter coupled to the left shifter. The multiplexer is coupled to select the output of one of the right shifter, the first one-bit right shifter, the two-bit right shifter, the three-bit right shifter, the second one-bit right shifter, the left shifter and the one-bit left shifter. Also included in the normalization unit is a normalization control unit coupled to control the right shifter, the left shifter and the multiplexer. By properly controlling these elements, the normalization control unit is able to normalize the multiplier""s (unnormalized) output.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. As will also be apparent to one of skill in the art, the operations disclosed herein may be implemented in a number of ways, and such changes and modifications may be made without departing from this invention and its broader aspects. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.