The present invention relates to floating point units (FPUs), and more specifically, to the reuse of the normalizer and rounder of the FPU for conversion from a fixed point format to a floating point data format which enables the execution of the log estimate in a single loop through a fused-multiply-add (FMA) data path and thereby improves the latency of log estimate instructions executed on the FPU within the FMA data path.
FIG. 1 illustrates a dataflow of a conventional floating point unit 10. The FPU 10 includes an aligner 15, a multiplier 20, an adder 25, a normalizer 30 and a rounder 35. As shown in FIG. 1, the data flow is designed to loop twice through the FPU data path in order to execute log estimate instructions. As shown, the first loop, as indicated by reference numeral 2, computes the estimate and the second loop, as indicated by reference numeral 4, converts the estimate into a floating point data format. Specifically, an input operand is normalized to a number x=2e′*1.f′ at the estimate normalize block 12. Then, an estimate is computed for fL=log(1.f′) using a table lookup 14, for example. The table lookup 14 returns the log estimate as a two's complement fixed point number e'.fL for conversion to a floating point number. A conventional algorithm for converting from the fixed format to the floating point format is shown in FIG. 2. As shown in FIG. 2, the two's complement input number n having m total bits and k bits before the binary point is read at operation 100. From operation 100, the process moves to operation 105, where, in the aligner 15 (depicted in FIG. 1), it is determined whether the input number n is less than zero. If it is determined at operation 105 that the input number n is greater than or equal to zero (i.e., a non-negative number), then the sign bit is set to zero, and if it is determined that the input number n is less than zero (i.e., a negative number) then the input number is inverted at operation 115. From operation 115, the process moves to operation 120, where the sign bit is set to one and the input number is incremented by adding ‘1’ to the least significant bit (LSB) via the adder 25 (depicted in FIG. 1). From operations 110 and 120, the process moves to operation 125 where leading zeros (“lz”) are counted via the adder 25 which includes a leading zero anticipator (LZA).
At operation 130, the output of the adder 25 is padded with zeros. Then at operation 135, in the normalizer 30 (depicted in FIG. 1), the result is shifted to the left by the number of leading zeros counted. From operation 135, the process moves to operation 140, where in the rounder 35 (depicted in FIG. 1), the exponent and fraction are calculated and the floating point number including the sign bit, the exponent, and the fraction is returned at operation 145. There are several disadvantages regarding the conventional data flow 10. For example, the implementation of log estimate instructions in the conventional data flow 10 reuses the entire data flow for converting from a fixed format to a floating point format. Therefore, resulting in a 10-cycle latency as shown in FIG. 1.