1. Field of the Invention
The present invention relates to computer processors, and, more particularly, to the processing of fused mathematical functions.
2. Description of the Related Art
One of the more common applications of floating-point units is in performing matrix operations. In digital signal processing applications (audio processing, graphics, simulation, and the like), a frequent matrix operation is multiplying a matrix by another a matrix (or vector), which is fundamentally the computation of an inner product, x1 y1+x2 y2+ . . . +xn yn. As can be seen, computing these inner products requires a series of multiply-add combinations.
Motivated by this need, a single instruction that computes (A*B)+C may be devised. This instruction is known as the fused multiply-add, and has a counterpart, the fused multiply-subtract (which, as might be expected, computes (A*B)xe2x88x92C; this may be viewed in the alternative as adding a negative number to the multiplication""s result). For the sake of simplicity, these instructions are referred to herein as a fused multiply-add instruction. Although executing such an instruction requires the ability to read three operands at a time, such an instruction has the potential for improving the performance of computations involving inner products.
The fusing of the multiply instruction with the add (or subtract) instruction provides two main advantages. First, by combining the multiply and add (or subtract) instructions, the result can be computed more quickly. This results from a shorter instruction datapath, for example, as a result of one instruction being used instead of two. Second, there need only be one rounding operation performed. Moreover, the fused multiply-add instruction computes (A*B)+C exactly, rounding only after all the calculations have been completed. This reduction in rounding increases the accuracy of inner products.
Embodiments of the present invention support such functionality, while allowing such designs to occupy less area in an integrated circuit design and still provide accurate results. The adder array of the multiplier of the fused instruction""s datapath can be minimized through the use of various techniques. By using techniques such as Booth recoding, the number of adders in the adder array can be reduced, thereby reducing the size of the multiplier and speeding its operation. However, the inventors determined that this could create anomalous results when the terms from the additions were combined with the value of C. Embodiments of the present invention detect such anomalous conditions and compensate therefor.
In one embodiment of the present invention, a method of executing a fused instruction is disclosed. The method includes calculating a number of terms from a first input term and a second input term, detecting an inherent carry in the terms, compensation for the inherent carry if the inherent carry exists in the number of terms resulting in a compensated term, and determining a fused instruction result by combining the compensated term with a third input term. The calculations performed can include, for example, calculating a number of Booth terms using a Booth recoding technique and calculating the number of terms by adding the number of Booth terms, and can result in a sum term and a carry term as the terms calculated.
In one aspect of the embodiment, detection of the inherent carry includes calculating a result by combining the sum and the carry terms, and indicating the result of the combination. If a carry out of the result has a value of one, the sum and the carry terms contain the inherent carry, and such is indicated. Otherwise, if a carry out of the result has a value of zero, the sum and the carry terms do not contain the inherent carry, and such is indicated. In such a scenario, the compensation performed includes extending the sum term with ones if existence of the inherent carry in the number of terms is indicated and extending the sum term with zeros if existence of the inherent carry in the number of terms is not indicated.
In another aspect of the embodiment, detection of the inherent carry includes examining a most significant bit of the carry term and indicating the result of the combination. If the most significant bit of the carry term has a value of one, the carry term contains an inherent carry, and such is indicated. Otherwise, if the most significant bit of the carry term has a value of zero, the carry term does not contain an inherent carry, and such is indicated. In such a scenario, the compensation performed includes extending the sum term with ones if existence of an inherent carry in the carry term is indicated and extending the sum term with zeros if existence of the inherent carry in the carry term is not indicated.
In another embodiment of the present invention, a fused instruction datapath is disclosed. Such a fused instruction datapath includes a normalization unit, a floating point multiplier and a mantissa alignment unit. The floating point multiplier is coupled to the normalization unit, and includes a term generation unit and a compensation unit coupled to the term generation unit. The mantissa alignment unit is coupled to provide an aligned mantissa to the floating point multiplier. It will be noted that a processor can be designed with such a fused instruction datapath. The mantissa alignment unit can include, for example, a mantissa alignment shifter and a mantissa alignment control circuit coupled to the mantissa alignment shifter. The floating point multiplier multiplies a first input number and a second input number, and the mantissa alignment control circuit can be designed to cause the mantissa alignment shifter to shift a mantissa of a third input number by a number of bit positions equal to a difference between an exponent of the third input number and a sum of an exponent of the first input number and an exponent of the second input number. The floating point multiplier further can also include a final adder, to which the mantissa alignment unit and the compensation unit are coupled. In one aspect of this embodiment, the term generation unit includes a term generator and an adder array. The adder array is coupled to the a term generator and is designed to generate a sum term and a carry term.
In one aspect of this embodiment, the compensation unit includes a word extender unit, coupled to receive a sum term from the term generation unit, and an extension control unit, coupled to receive a carry term from the term generation unit and to provide an extension control signal to the word extender unit.
In a further aspect of this embodiment, the extension control unit is designed to examine a most significant bit of the carry term, indicate the carry term contains the inherent carry via the extension control signal, if the most significant bit of the carry term has a value of one, and indicate the carry term does not contain the inherent carry via the extension control signal, if the most significant bit of the carry term has a value of zero. In such an aspect, the word extender unit is designed to extend the sum term with ones if existence of the inherent carry in the number of terms is indicated by the extension control signal, and extend the sum term with zeros if existence of the inherent carry in the number of terms is not indicated by the extension control signal.
In a still further aspect of this embodiment, the extension control unit is further coupled to receive the sum term from the term generation unit and designed to calculate a result by combining the sum and the carry terms. In such an aspect, the extension control unit is designed to indicate the sum and the carry terms contain the inherent carry via the extension control signal, if a carry out of the result has a value of one, and indicate the sum and the carry terms do not contain the inherent carry via the extension control signal, if the carry out of the result has a value of one. Also in such an aspect, the word extender unit is designed to extend the sum term with ones if existence of the inherent carry in the number of terms is indicated by the extension control signal, and extend the sum term with zeros if existence of the inherent carry in the number of terms is not indicated by the extension control signal.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. As will also be apparent to one of skill in the art, the operations disclosed herein may be implemented in a number of ways, and such changes and modifications may be made without departing from this invention and its broader aspects. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.