In digital processing systems, numerical data is typically expressed using integer or floating-point representation. Floating-point representation is preferred in many applications because of its ability to express a wide range of values and its ease of manipulation for some specified operations. A floating-point representation includes a mantissa (or significand), an exponent, and a sign component. The mantissa represents the integer portion before the binary (or decimal) point, as well as the fractional portion after the binary point. In normalized form, the mantissa ranges from “1” to less than the value of the “base”, which is two for binary but ten for decimal (i.e., 1.0≦mantissa<2.0 for normalized binary numbers). A special representation is typically used to denote 0.0. The exponent represents a scaling factor that is multiplied with the mantissa to arrive at the number being represented. The exponent is typically expressed as a power of the “base” (two for binary numbers). Finally, the sign component expresses the sign of the number, i.e., whether the number is positive or negative.
The Institute of Electrical and Electronic Engineers (IEEE) standard for floating-point arithmetic defines specific formats for representing floating-point numbers. According to the IEEE standard, a floating-point number includes a sign bit, an exponent, and a fraction. The IEEE standard has become the universal format in all microprocessor designs. The standard defines two basic formats: single precision (32 bits) and double precision (64 bits), and also provides extended formats.
Modern computer processors typically include a floating-point unit to perform mathematical operations on floating-point numbers according to the IEEE standard. In most floating-point processors, and as defined by the IEEE standard, fused multiply-add type operations are supported, where two operands are multiplied and a third operand is added to the full precision product, and then rounded with a single rounding error. For example, multiplication can be performed on two “normalized” operands. A normalized floating-point number is represented by a mantissa having a “1” value in the most significant bit (MSB) location and a format of 1.xxx--xx, where each “x” represents one bit that is either a one or a zero. As defined by the IEEE standard, the fractional portion “xxx--xx” represents 23 bits after the binary point for normalized single precision numbers and 52 bits for normalized double precision numbers. For a normalized number, the mantissa ranges from one to two (1.0≦mantissa<2.0). Multiplication of two normalized operands produces a resultant mantissa that ranges from one to four (1.0≦mantissa<4.0) and has a format of 01.xxx--xxxx or 1x.xxx--xxxx, where the fractional portion “xxx--xxxx” represents more than 23 bits (or 52 bits) for the unrounded multiplier result with single (or double) precision numbers. After optionally adding a properly aligned mantissa from a third operand, in the case of a fused multiply-add type instruction, post-processing is then performed on the result (i.e., the resultant mantissa), which includes, as necessary, normalization, rounding, and possible re-normalization. Floating-point multiplication is typically performed by a specially designed unit that implements a multiplication algorithm (such as the Booth or modified Booth algorithm).
Floating-point units have been constructed for performing arithmetic operations on single-precision floating-point data, double-precision floating-point data, or either single-precision floating-point data or double-precision floating-point data. Such floating-point units contain registers for storing floating-point data being processed, logic for processing the sign and exponent parts of floating-point data, mantissa arithmetic units for processing the mantissa, and logic for providing status signals to the processor controlling the floating-point unit.
In order to reduce costly circuit area, modern computing floating-point units need to handle data in scalar and vector mode. For example, one 64 bit double precision data path must be able to process two 32 bit single precision data in vector mode, or one 64 bit double precision data, with the same hardware, in scalar mode.
In the early processing steps of a floating-point unit (e.g., aligner, multiplier, and adder), it is possible to split the data path to guarantee the integrity of vector element data. In a floating-point unit normalizer, the amount of fraction bits is reduced, and the leading zeros of all vector data elements must be shifted out. Vector data elements are then brought close together before being rounded independently. Therefore, in the normalizer, a simple split of the shifter is not possible, since each part of the input must be shiftable over the complete width of the normalizer in scalar mode.