The present invention relates generally to floating point processors, and, more particularly, to a normalizer shift prediction for log estimate instructions executed by a floating point processor.
In a floating point processor unit (“BFU”), the computation of a logarithm (“log”) estimate instruction differs from the standard multiply and add instructions used in almost all modern floating point units. Nevertheless, to save space and power, it is desirable to reuse as much of the present data paths and logic as possible, especially with respect to relatively large circuits such as a normalization shifter within the normalizer circuit or portion of the BFU. Reusing hardware is generally no problem for floating point processor designs where there is only one instruction in execution at a given time. The instruction can choose freely which part of the hardware it wants to use at any time during its execution.
However, if the floating point processor unit is a pipelined design, the execution of an instruction is bound to using a predefined part of the hardware in each execution cycle. This prohibits the use of the normalizer circuit for a straightforward implementation of the log estimate instruction. This is because the amount of work that has to be done to compute the shift amount which is fed to the normalizer circuit is greater than for standard multiply and add instructions. The normalizer shift amount for the log estimate instruction is equal to the number of leading zeroes of the instruction results intermediate significand. A relatively simple solution may be a leading zero counter circuit over the complete width of the result intermediate significand. The problem, however, with this implementation is that it is relatively complex and thus not fast enough to fit into the pipelined dataflow. Another possible solution is to switch the floating point unit into a multi-cycle mode. In this mode, an instruction is allowed to use the pipeline multiple times, which permits the insertion of additional cycles by later jumping back to the start of the pipeline. The disadvantage of this solution is that it severely limits the throughput of instructions by the floating point processor unit.