1. Field of the Invention
The present invention relates to the field of floating point units in microprocessors. Specifically, the present invention relates to hardware floating point multipliers and the support of subnormal operands in floating point multiplications.
2. Discussion of the Related Art
Multiplier units are commonly found in digital signal processors and more recently in RISC-based processors. Double precision floating point operations involve the inherently slow operation of summing 53 partial products together to produce the product. IEEE compliant multiplication also involves the correct rounding of the product adjustment to the exponent and generation of correct exception flags. Multiplier units imbedded in modern RISC-based processors must also be pipelined, small and fast. Judicious functional and physical partitioning are needed to meet all these requirements. FIG. 1 shows the standard IEEE floating point data formats. The floating point the value definitions according to IEEE standards corresponding to the floating point data formats shown in FIG. 1 are given below.
s=sign PA1 e=biased exponent PA1 f=fraction PA1 E=number of bits in exponent (8 for single, 11 for double) PA1 F=number of bits in fraction (23 for single, 52 for double) PA1 B=exponent bias (127 for single, 1023 for double) PA1 Normalized Value (0&lt;e&lt;2.sup.E-1): PA1 Subnormal Value (e=0): PA1 Zero:
(-1).sup.s .times.2.sup.e-B .times.1.f PA2 (-1).sup.s .times.2.sup..sup.1-B .times.0.f PA2 (-1).sup.s .times.0
A floating point number is analogous to a number in scientific notation with the primary difference being in the IEEE standard, the number is in a base 2 number representation system whereas in standard scientific notation, numbers are represented in base 10. Thus, a floating point number has an exponent which represents the binary order of magnitude by which its mantissa must be multiplied. In a normalized floating point number, the implicit leading 1 as well as all F fractions bits are significant. Therefore, the mantissa of a normalized floating point number always has F+1 significant binary digits in the mantissa.
The IEEE standard for floating point numbers includes two different precisions for floating point numbers. As depicted in FIG. 1, the first type, single precision floating point numbers, have 23 fraction bits (FRACTION[22:0]) and 8 exponent bits (EXP[7:0]); whereas the second type, double precision floating point numbers, have 52 fraction bits (FRACTION[51:0]) and 11 exponent bits (EXP[11:0]).
Because the exponent is represented by a fixed number of bits according to the IEEE standard, upper and lower limits exist to the absolute values of numbers which can be represented in normalized form. The minimum value for the exponent is defined to be one, However, supporting subnormal numbers allows the expansion of the lower absolute value range of representable numbers in floating point format. Because the leading one in a normalized number and the leading zero in a subnormal number are implied rather than explicitly included in the data stored, this leading bit cannot be used to distinguish normalized from subnormal numbers. Instead, one exponent value, zero, is reserved to indicate that the corresponding mantissa is from a subnormal number, therefore the leading bit is a zero. Therefore, every subnormal number has the minimum possible representable exponent field, zero, and has a mantissa with fewer significant digits than in a normalized number. Although the exponent field for a subnormal number is zero, this is simply a code to indicate that the number is subnormal in order to distinguish subnormal numbers from normal numbers. The value of the exponent for a subnormal number is interpreted as being one, as indicated in equation 2.
Because it is generally desirable to maintain the full precision of mantissa, subnormal numbers having fewer significant digits are only allowed when the exponent has reached its minimum value such that it is necessary to sacrifice precision in order to expand the range of representable numbers. Thus, the absolute value of any subnormal number is less than the absolute value of any normalized number. By permitting subnormal numbers to be represented, the range of representable numbers is increased by F binary orders of magnitude. Thus, between zero and the smallest positive normalized number, there are F binary orders of magnitude of representable subnormal numbers.
Binary multiplication of two floating point numbers M1 and M2 is analogous to the multiplication of two base 10 numbers in scientific notation. The mantissas of the operands M1 and M2 are multiplied by each other to produce the resultant mantissa while the exponents e1 and e2 of the operands M1 and M2 are added to produce the resulting exponent er. Typically, a hardware multiplier unit supports all possible operand combinations. Therefore, a typical hardware multiplier will support the multiplication of two normalized operands and the multiplication of one normalized operand and a second subnormal operand. Typical hardware multipliers will not support the multiplication of two subnormal operands, because such a multiplication will always produce a result which is smaller than the smallest subnormal number.
Each multiplication takes two operands, M1 and M2. If both M1 and M2 are normalized floating point numbers, then the mantissa for M1 is 1.f1, and the mantissa for M2 is 1.f2. Because each normalized number has an implicit leading 1, every normalized number has F+1 significant bits. The result output by the mantissa multiplier 200 takes the form 1.fr, where the number of bits in the fraction part fr is either 2F or 2F+1, depending upon the sizes of f1 and f2. When the number of bits in the fraction part is 2F, this is described as "non-overflow." When the number of bits in the fraction part is 2F+1, this is described as "overflow."
The overflow-non-overflow distinction is easily understood with a base 10 analogy. 11*11=121. In scientific notation, this is 1.1.times.10.sup.1 *1.1.times.10.sup.1 =1.21.times.10.sup.2. Here the number of fraction digits in the output is twice the number of fraction digits in the inputs--this is non-overflow. 99*99=9801. In scientific notation, this is 9.9.times.10.sup.1 *9.9.times.10.sup.1 =98.01.times.10.sup.2 =9.801.times.10.sup.3. Here the number of fraction digits in the output is one plus twice the number of fraction digits in the inputs--this is overflow. In the overflow situation, the exponent must be incremented by one. Usually, it is desirable that the result R produced by a multiplication will have the same precision (the same number of significant bits) as each of the operands M1 and M2.
If the mantissa multiplier 200 array produces 2F significant mantissa fraction bits, while the result R can include only F significant mantissa fraction bits, the least significant F mantissa bits output by the multiplier are used only to produce the three rounding bits--the guard, round, and sticky bits--for rounding purposes. The least significant F mantissa bits output from the mantissa array are then discarded.
A subnormal number takes the form 0.f. Since the most significant bit of any subnormal number is an implicit zero, a subnormal number has at most F significant bits. However, a subnormal number can have as few as only 1 significant bit, if the only non-zero fraction bit is the least significant bit of the fraction. The positive subnormal number having only 1 significant bit is the smallest non-zero positive representable number.
FIG. 2 illustrates the logical structure of a typical floating point multiplier unit. With high modern clock frequencies, a typical modern floating point unit will include pipeline registers at various points. The illustration of these registers is omitted for simplicity. Multiplier units that handle subnormal operands and results often require the determination of leading zeros, adjustments to the input and output mantissas (shifting), adjustment to the output exponent, and rounding. The calculation of the resultant exponent er is complicated by the supporting of subnormal numbers. The exponent e in a floating point number is biased by an exponent bias B. In a subnormal number, there is an implicit leading zero and there may be up to F-1 additional leading zeros. For purposes of multiplication, when one or more of the operands is subnormal, that subnormal operand is converted into a normalized mantissa before being input into the mantissa multiplier array 200.
FIG. 2 illustrates the hardware necessary to support all the possible input combinations. In FIG. 2, a leading zero detector 201 determines if one of the operands M1 is subnormal, and if so, how many leading zeros are in the mantissa. The number of leading zeros is referred to as z. FIG. 3 illustrates one way a leading zero detector can be implemented. The latency of the leading zero detector in FIG. 3 is proportional to the log.sub.2 N, where N is the of the number of inputs (I0 through I15 in FIG. 3). The silicon area required to implement the hardware in FIG. 3 is proportional to Nlog.sub.2 N. The leading zero detector 201 in FIG. 3 produces an encoded left shift output (z3, z2, z1, and z0). The encoded output 202 (in FIG. 2) is necessary to properly adjust the output exponent er according to Equation 2. Inverter 206 converts the encoded output 202 of the leading zero detector 201 into its one's complement. By hooking the carry input Cin to the carry-save adder 207 to a one, the negative of z in two's complement form is supplied to the carry save adder 207. The encoded output 202 is also necessary to drive the left shifter 203. FIG. 4 illustrates one way to implement a left shifter. The left shifter shown in FIG. 4, a barrel shifter, is constructed of an array of 2-to-1 multiplexors. The left shifter 203 converts a subnormal mantissa into a normalized mantissa, so that the most significant one in the subnormal mantissa is shifted into the most significant output 400 (in FIG. 4) of the left shifter 203. The latency of the left shifter in FIG. 4 is proportional to log.sub.2 N. The silicon area required to implement the hardware in FIG. 4 is proportional to Nlog.sub.2 N.
Because the result R may be subnormal, the output 204 of the mantissa multiplier 200 is input into a right shifter 205. The right shifter 205 is implemented similarly to the left shifter 203. The quantity rshift is input to the right shifter 205, and is calculated according to Equations 2 and 3 by the rshift logic block 208.
By fully supporting multiplication in hardware when one of the operands is subnormal, the hardware requirements for the resultant mantissa and exponent calculations are substantial. In FIG. 2, the carry-save adder 207 exists only to add in the two's complement of z. Additionally, the rshift logic 208 is necessary to control the right shifter 205. If the multiplier unit is implemented as shown in FIG. 2, the rshift logic 208 is of non-trivial complexity. The leading zero detector 202, the left shifter 203, the right shifter 205, the rshift logic 208, the inverter 206, and the carry-save adder 207 are all necessary to support multiplication where one of the operands is subnormal.
As is apparent from the above discussion, it is desirable to avoid overly complicating the multiplier hardware. There is a need for an efficient hardware multiplier that reduces size and latency. As long as a the fraction of multiplication operand permutations that are not supported in hardware is statistically insignificant, a performance gain could be achieved.