The present invention relates to floating-point processors, and more particularly to floating-point processors having improved intermediate result handling capability for multiply-add operations.
In digital processing systems, numerical data is typically expressed using integer or floating-point representation. Floating-point representation is preferred in many applications because of its ability to express a wide range of values and its ease of manipulation for some specified operations. A floating-point representation includes a mantissa (or significand), an exponent, and a sign component. The mantissa represents the integer portion before the binary (or decimal) point as well as the fractional portion after the binary point. The mantissa typically ranges from xe2x80x9c1xe2x80x9d to less than the value of the xe2x80x9cbasexe2x80x9d, which is two for binary but ten for decimal (i.e., 1.0xe2x89xa6mantissa less than 2.0 for binary numbers). A special representation is typically used to denote 0.0. The exponent represents a scaling factor that is multiplied with the mantissa to arrive at the number being represented. The exponent is typically expressed as a power of two. Finally, the sign component expresses the sign of the number, i.e., whether the number is positive or negative. Floating-point representations are also defined by xe2x80x9cIEEE Standard for Binary Floating-Point Arithmetic,xe2x80x9d which is referred to herein as the IEEE-754 standard (or simply the IEEE standard) and incorporated herein by reference in its entirety for all purposes.
Many operations can be performed on floating-point numbers, including arithmetic operations such as addition, subtraction, and multiplication. For arithmetic operations, the IEEE standard provides guidelines to be followed to generate a unique answer for each floating-point operation. In particular, the IEEE standard describes the processing to be performed on the result from a particular operation (e.g., multiply, add), the precision of the resultant output, and the data format to be used. For example, the IEEE standard defines several rounding modes available for the results from add and multiply operations, and the bit position at which the rounding is to be performed. The requirements ensure identical results from different implementations of IEEE-compliant floating-point processors.
In most floating-point processors, and as defined by the IEEE standard, multiplication is performed on two xe2x80x9cnormalizedxe2x80x9d operands. A normalized floating-point number is represented by a mantissa having a xe2x80x9c1xe2x80x9d value in the most significant bit (MSB) location and a format of 1.xxx--xx, where each xe2x80x9cxxe2x80x9d represents one bit that is either a one or a zero. As defined by the IEEE standard, the fractional portion xe2x80x9cxxx--xxxe2x80x9d represents 23 bits after the binary point for normalized single precision numbers and 52 bits for normalized double precision numbers. For a normalized number, the mantissa ranges from one to two (1.0xe2x89xa6mantissa less than 2.0). Multiplication of two normalized operands produces a resultant mantissa that ranges from one to four (1.0xe2x89xa6mantissa less than 4.0) and has a format of 01.xxx--xxxx or 1x.xxx--xxxx, where the fractional portion xe2x80x9cxxx--xxxxxe2x80x9d represents more than 23 bits (or 52 bits) for the unrounded multiplier result with single (or double) precision numbers. Post-processing is then performed on the result (i.e., the resultant mantissa), which includes, as necessary, normalization, rounding, and possible re-normalization. Floating-point multiplication is typically performed by a specially designed unit that implements a multiplication algorithm (such as the Booth or modified Booth algorithm).
Many applications perform multiplication on two operands and addition (or subtraction) of the resultant product with a third operand. This multiply-add (or Madd) operation is common, for example, in digital signal processing. Madd operations are used for computing filter functions, convolution, correlation, matrix transformations, and other functions. Madd operations are also commonly used in geometric computation for (3-D) graphics applications.
Conventionally, a Madd operation can be achieved by sequentially performing a multiply (MUL) operation followed by an add (ADD) operation. Performing the operations sequentially results in long processing delay. Improved performance can often be obtained by performing the Madd operation using a specially designed unit that also supports conventional floating-point multiplication and addition.
As noted above, for multiply and Madd operations, post-processing is typically performed on the result from the multiply operation. The post-processing increases the processing time of these floating-point operations. The increased processing time is compounded for the Madd operation, which is a concatenation of a multiply and an add operation. Accordingly, for Madd operations, techniques that simplify the post-processing of the intermediate result from the multiply operation and reduce the overall processing time are highly desirable. It is also desirable that these techniques generate Madd output that fulfills IEEE rounding requirement, as if the Madd operation were achieved by a MUL operation followed by an ADD operation.
The invention provides floating-point processors capable of performing multiply-add (Madd) operations and incorporating improved intermediate result handling capability. The floating-point processor includes a multiplier unit coupled to an adder unit. The intermediate result from the multiplier unit is processed (i.e., rounded) into a representation that is more easily managed in the adder unit. However, some of the processing (i.e., normalization) to generate an IEEE-compliant representation is deferred to the adder unit. By deferring the normalization of the intermediate result, the corresponding adjustment of the exponent from the multiplier unit is also avoided (and performed later when the normalization is performed). By combining and deferring some of the processing steps for the intermediate result, circuit complexity is reduced and operational performance is improved.
An embodiment of the invention provides a floating-point unit (FPU) configurable to perform Madd operations. The FPU includes a multiplier unit coupled to an adder unit. The multiplier unit is configured to receive and multiply mantissas for two operands to generate a multiplier output mantissa. The multiplier output mantissa is rounded and has a range greater than a normalized mantissa. The adder unit is configured to receive and combine the multiplier output mantissa and a mantissa for a third operand to generate a FPU output mantissa. The multiplier output mantissa can have a format of 01.xxx--xxxx or 1x.xxx--xxxx, and is rounded in accordance with IEEE standard. The FPU typically also includes additional units to process the exponents for the operands. The FPU can be incorporated within a processor or other hardware structure, and can also be implemented using hardware design languages (e.g., Verilog).
Another embodiment of the invention provides a floating-point processor configurable to perform Madd operations. The floating-point processor includes a multiplier unit coupled to an adder unit. The multiplier unit includes a multiplier array operatively coupled to a first rounding unit. The multiplier array is configured to receive and multiply mantissas for two operands. The first rounding unit is configured to round an output from the multiplier array. The adder unit includes a carry propagation adder (CPA), a second rounding unit, and a normalization unit. The CPA is configured to receive and combine a rounded mantissa from the multiplier unit and a mantissa for a third operand. The second rounding unit couples to the CPA and is configured to receive and round the mantissa from the CPA. The normalization unit couples to the second rounding unit and is configured to receive and normalize the rounded mantissa. Within the multiplier unit, another CPA can be coupled between the multiplier array and the first rounding unit and configured to receive and combine a sum output and a carry output from the multiplier array. Again, the floating-point processor typically includes additional units to process the exponents for the operands.
Yet another embodiment of the invention provides a method for performing a floating-point Madd operation. In accordance with the method, the mantissas for two operands are multiplied to generate a third mantissa, which is then rounded to generate a fourth mantissa. The fourth mantissa has a range greater than a normalized mantissa. The fourth mantissa is combined with a mantissa for a third operand to generate an output mantissa. The output mantissa can further be rounded and normalized to generate a representation that conforms to the IEEE standard.
The invention also provides computer program products that implement the embodiments described above.
The foregoing, together with other aspects of this invention, will become more apparent when referring to the following specification, claims, and accompanying drawings.