Floating point operations have applications in many areas including real-time 3D graphics, linear algebra, partial differential equations, and Fourier transformations. Therefore, modern floating point unit (FPU) designs are increasingly designed to perform fast operations in both single and double precision. Accordingly, in many floating point execution unit designs, the FPU pipeline is optimized for the common case of normalized numbers. Exceptions, such as denormals arising during computation, are often handled in software. However, denormals are important to facilitate gradual underflow. Therefore, for better performance, efficient hardware execution units that handle input and/or output denormal values efficiently are desirable. As used herein, the term “floating point operations” or “floating point arithmetic operations” refer generally to arithmetic operations involving floating point numbers including addition/subtraction, multiplication, division, multiply-add, square root, reciprocals, reciprocal square roots, transcendental function computation, etc.
Further, in FPUs that perform multi-precision floating point operations, the rounding of results is often performed. Therefore, circuits that perform efficient multi-precision rounding are increasingly useful.
The IEEE Standard describes four rounding modes which are (i) round to Zero (RTZ), where all numbers are rounded toward zero, (ii) round to infinity (RI), where negative numbers are rounded toward zero and positive numbers are rounded away from zero, (iii) round to negative infinity (RNI), where negative numbers are rounded away from zero and positive number are rounded toward zero, and (iv) round to nearest. Typically, an FPU “rounding mode” may determine which of the IEEE conventions is used.
In many modern FPUs, injection rounding techniques are used to reduce the number of rounding modes, for example, to RTZ. The term “injection rounding” refers to the injection of a value before the carry look ahead addition, where the injected value (to correctly effect the rounding) is determined based on the actual rounding mode being applied. Many processor designers favor injection rounding for speed and efficiency reasons because: (i) execution latency is usually not increased by insertion of the injection values; and (ii) adjustments after carry look ahead addition to obtain rounded values may proceed quicker than conventional non-injection rounding. However, issues arise when injection rounding is used with denormal inputs or results because it is not known where the injection value is to be inserted until after a normalizing shift of the denormal number has been completed.
Further, conventional FPUs that use non-injection rounding often wait for the value of the most significant bit (msb) of the mantissa (e.g. the 52nd bit position for double precision floating point) to be known prior to starting the rounding process to determine the correctly rounded result based on rounding mode. However, latency arises in conventional circuits using non-injection rounding because the value of the msb, for example, in a multiply operation is generally known late in the computation. Thus, conventional non-injection rounding techniques often experience considerable latency.
Therefore, some disclosed embodiments present an efficient low latency structure for floating point execution units with non-injection rounding, while providing for denormal inputs and outputs.