Conventional computer systems may include one or more floating-point units that may be used to perform operations on floating-point numbers (computer encodings that represent real numbers). Exemplary floating-point units may perform mathematical operations such as: addition, subtraction, multiplication, division, and square root. A floating-point unit may also be known as a math coprocessor. A variety of floating-point numbers may be used (e.g., single precision and double precision floating-point numbers). In one exemplary embodiment, a single precision floating-point number is a 32 bit binary number comprising a sign bit, an 8 bit exponent, and a 24 bit significand (e.g., mantissa or significant digits). In one optimized single precision embodiment, only the first 23 bits of a mantissa are stored, while the 24th bit, or most significant bit (msb), is assumed to be 1 in normal cases. As discussed herein, such floating-point numbers require normalization, such that the most significant bit of the mantissa will be 1.
In one embodiment, a floating-point unit comprises a fused multiply add (FMA) unit. An exemplary FMA unit may perform a combined multiplication and addition operation, such as: (A*B)±C. In one embodiment, the FMA may operate on single or double precision floating-point numbers. Further, combining a multiplication and an addition operation into a single step may allow the two operations to be performed with only a single rounding (rather than a rounding after the multiplication and a rounding after the addition).
In one embodiment, a normalization unit may be used to normalize the result of the multiplication add operation before it is rounded. An exemplary final mantissa (i.e., the significant digits of a final resulting floating-point number) of an FMA for non-special case results (e.g., infinity, NaN, and zero) must either be normalized (i.e., no leading zeros with a biased exponent greater than zero), or denormal (i.e., a biased exponent of zero). In one exemplary embodiment, a normalization unit converts a floating-point number into a normalized form. For example, the following binary floating-point numbers, 1100, 1100×2−2, and 0.1100×2+3, when converted to their normalized forms equal: 1.100×2+3, 1.100×2−1, and 1.100×2+2, respectively. When normalizing a binary floating-point number, the most significant bit of the mantissa will be 1. In exemplary embodiments, as discussed herein, when normalizing the intermediate result (e.g., (A*B)+C)) from an FMA, the extent of a radix point shift and a corresponding exponent adjustment will be dependent upon the number of leading zeros of an intermediate results mantissa. The FMA intermediate result is an FMA result after multiplication and addition, but before normalization and rounding. For example, an 8-bit mantissa of 00001010 may be normalized to 10100000, with a corresponding adjustment to the exponent.
An amount by which the mantissa may be normalized is influenced by the exponents of the floating-point number inputs and the number of leading zeroes of the FMA's exemplary intermediate result. Exemplary conventional hardware optimizations may be introduced to reduce FMA unit latencies, but such optimizations may introduce errors into the normalization amount.
A first exemplary source of error may come from a leading zero anticipator (LZA) module which performs a fast estimation of the quantity of leading zeros in parallel with the FMA's completion adder to determine the normalization amount. As discussed herein, an LZA may rapidly estimate a quantity of leading zeros before a completion adder, rather than waiting for the completion adder to produce a sum with the actual number of leading zeroes. An exemplary normalization shift amount as determined by an LZA may differ by 1 from a true quantity of leading zeros. In one embodiment, the LZA may introduce errors of −1, while in another embodiment, the LZA may introduce errors of +1. This error may be introduced in multiply dominated scenarios. A multiply dominated scenario may occur when the exponents of the inputs (A, B, C) are such that the product (A*B) exponent is greater than the addend (C) exponent.
A second source of error may come from further hardware optimizations of the FMA, where a completion adder width and an LZA width are limited to the lower bits of the intermediate result, roughly the width of the product mantissa (which may be twice the width of input operands), and the upper bits use an incrementer.
In one exemplary implementation, the completion adder and LZA width is equal to the product mantissa plus two bits. This exemplary error may be introduced in an addend dominated scenario. An addend dominated scenario may occur when the exponents of the inputs (A, B, C) are such that the addend (C) exponent is greater than the product (A*B) exponent. The exemplary error may occur when a carry-out operation from the completion adder causes the upper bits of the sum's mantissa to be incremented such that the higher bits of the addend propagate the carry-out into the most significant set bit (e.g., a non-incremented string of addend upper mantissa bits is monotonically increasing from left to right: i.e., zero or more 0s followed by zero or more 1s). In this exemplary case, one approach would cause the normalizer unit to overshift by 1, requiring the normalizer to be 1 bit wider as well as requiring an additional right shift to correct the sum's mantissa before it is sent to the rounder. Also, in addend dominated scenarios, a borrow operation during effective-subtraction from the most significant bit may cause the shift amount from the exponent difference to be too small by 1. This exemplary case may be corrected by performing an additional 1-bit left shift.
Floating-point numbers that have a sign bit can be positive or negative. This affects the effective operation that is being performed. For example an addition operation on two positive numbers or two negative numbers is referred to as effective-addition, in which case the magnitude of the result is larger than the magnitude of the inputs. On the other hand, an addition operation on a positive and a negative number is referred to as effective-subtraction. Correspondingly, a subtraction operation on two positive numbers or two negative numbers is referred to as effective-subtraction and a subtraction operation on opposite signed numbers is referred to as effective-addition.
These exemplary sources of error cannot occur simultaneously, since addend dominated scenarios and multiply dominated scenarios are mutually exclusive. Furthermore, effective-addition and effective-subtraction are mutually exclusive. In one exemplary embodiment, these errors may be corrected by using a right shift and a left shift. This would cost extra latency, area, and power on the chip.