The calculation of a floating point within processing systems is an important part of numerical calculation. Floating point calculation can be generally defined as the computation of a number that potentially has meaningful values to the right of a decimal point. There are a number of ways to represent and calculate floating point computations.
In the Institute for Electronics and Electrical Engineers (IEEE) 754 Binary Floating Point Standard, a floating point number is represented as sign, exponent and fraction. The exponent is represented as a biased binary value. In other words, the exponent “e” is the binary value of “E” minus a pre-defined bias. This can be represented mathematically as the value(E)=binary_value(E)−bias. For n-bit exponents, the bias value is (2(n-1)−1). For instance, for an 8-bit exponent of a single precision number, the bias is 127.
FPU design is split into an exponent data path and a fraction data path. The input into a fraction adder of the fraction data path is in the form of A times B plus C. The exponent data path conveys the exponents Ea, Eb and Ec into an exponent logic.
In conventional FPUs, there is generated an exponent value from exponent logic. Depending on the exponent difference of addend and product, and some sign information as calculated within the fraction adder and conveyed over a select product line, this exponent either is the exponent of the addend (Ec), the exponent of the product (Ea+Eb−bias), or the exponent of the product plus an offset (Ea+Eb+delta).
Thus, based on the exponent difference, multiplexers select three values Ex, Ey, Ez, which, when added together, give the appropriate exponent. These three values are input into a 3:2 compressor, thereby generating a carry and a sum. The carry and the sum are then added together in a 2:1 adder. This summed value is then conveyed to an Exponent Adjust and Rounding logic (EAD). This summed value is the value “e.” The summed value is the exponent corresponding to the unrounded fraction provided by the adder in the fraction data path. The EAD logic adjusts the exponent based on the normalization shift amount and performs the exponent rounding.
Within the FPU calculator, there is also something called a “leading zero anticipator” (LZA). The LZA generates an estimate of the number of leading zeroes in the result of the fraction adder. In other words, while the exponent logic determines the addition of received exponent values Ea, Eb and Ec, the LZA predicts the number of zeros that are going to occur as “leading zeros” within the addition process of the fraction adder. However, this is only a prediction, and the prediction of the LZA can be one more than it should be. Whatever the result, the output of the LZA is subtracted from the output of the 2:1 adder of the exponent logic (the value “e”) and a first possible value, “e2”, is generated.
As discussed previously, due to the nature of the LZA estimation, the exponent “e2” using the estimate from the LZA can be one count lower than the exponent should be if the count of the LZA were to accurately reflect the number of leading zeros. Therefore, the EAD calculates the exponent based on both the possible values of the actual number of leading zeros. For instance, e2=e−lza and e2=e−lza+1. Meanwhile, both the output of the LZA and the fraction adder are input into an LZA correction circuit. The LZA correction circuit then sends a signal, lza_corr, to the EAD that signifies whether or not to use the higher or lower exponent number in the EAD. The EAD uses the lza_corr to select one of the two possible e2 values. In other words, the selected value becomes the final e2 value.
Meanwhile, and substantially in parallel, a normalizer-rounder circuit receives as input the output of the fraction adder and the output of the LZA. The normalizer takes a received calculated value of an arbitrary number of floating point precision, such as 128 bits, and “normalizes” it, shifting out the leading zeros. The rounder rounds the normalized fraction to a standard format of “x” number of bits, such as 23 bits for single precision. The rounding of the exponent is done in EAD.
Furthermore, within the EAD, after the selection of the correct e2 value (which occurs after receiving the lza_corr value from the lza_correction signal from the LZA correction circuit), the FPU tests for overflow, underflow or special values, such as NAN and Infinity (this is part of the exponent rounding). Typically, e2 is compared to both an “emax” value and an “emin” value (these values are constants), and overflow and underflow signaling values are generated therefrom. These overflow and underflow signaling values are incorporated into a result select signal generated by the EAD. The result select signal signifies whether e2 (and the normalized rounded fraction) is a valid value or, alternatively, whether an underflow or overflow has occurred or whether a special result (NAN, infinity, zero) is to be chosen. The result select signal and the e2 value are input into the result MUX. The result MUX selects between the regular rounded result, and some special values, such as Infinity, NAN, Zero. This selection is done based on the result select signal provided by EAD.
From the EAD to the result MUX, one of four different values are given within the result select signal. If the signal is overflow, underflow, or special value, the e2 signal is not to be used. If neither of these conditions apply, the result generator uses the e2 value and combines with the normalized output of the normalizer/rounder to create a final floating point sum generated as a standardized floating point value as a function of the result signal, the EAD e2 value, and the normalizer/rounder.
There are different kinds of rounding that can be performed by the rounder. In a fully IEEE compliant FPU, the design supports four rounding modes. The four rounding modes are a rounding up or down to the closest representable value mode, always rounding towards zero (for both positive and negative numbers), always rounding towards plus infinity (that is, to the higher value for both positive and negative values), and always rounding down towards the negative infinity (that is, to the smaller value for both positive and negative values). During the rounding step, the rounder and EAD together check for exception conditions, such as Overflow, Underflow and Inexact result indicia. Illegal operation exception and divide by zero get detected very early in the pipeline. In other words, there are two more IEEE exceptions, but they are not detected by the rounder and EAD; they can be detected based on the inputs within the first couple of cycles.) In case of denormal results (which have a 0 in front of the binary point and come only with the smallest possible exponent), modifications of the normalization and rounding are required. Depending on the design, this is either done on the fly while passing the data through LZA, normalizer and rounder, or extra cycles are added in order to adjust the result.
The FPU is either in IEEE mode, which means the result is fully IEEE compliant, or the FPU only supports parts of the IEEE standard in order to improve the performance of the FPU. In order to improve the performance of the floating-point operations, some design only supports part of the IEEE standard, that is, the design only implements one rounding mode and denormal results are forced to zero. High-performance real-time graphics applications are tuned to use the simplest of the IEEE rounding modes: round towards zero, also known as truncation. Such a fast FPU mode with truncation rounding is very appealing because the fraction rounding is reduced to truncating the fraction, whereas the other three IEEE rounding modes require an incrementer in the rounder which increments the fraction. Thus, a fast mode with truncation speeds up the rounding step.
However, there is a problem with prior art fast mode calculations which comprise truncation rounding. There can be significant processing time in calculating the exponents “e2” based on exponent “e” and the output LZA value, performing an LZA correction to determine the final value of “e2,” and checking for overflow and underflow conditions. When supporting all four IEEE rounding modes, the time to run the EAD e2 calculations as a function of the LZA, and correcting the lza_corr and the overflow/underflow check, may not be an issue, as the normalizer and rounder takes time to perform its intensive calculation. However, in fast mode, there is no rounder used on the fraction path just the normalizer. Under this condition, the processing time of the EAD can be a bottleneck.
Therefore, there is a need for an FPU system designed for operation in fast mode that addresses at least some of the disadvantages associated with conventional FPU systems designed to operate in fast mode.