1. Field of the Invention
This invention is related to floating point execution units in processors and, more particularly, to approximations in floating point calculations.
2. Description of the Related Art
In floating point units of processors, approximation hardware is often used to approximate certain functions. For example, division, square root, and other functions are often approximated. Various approximation algorithms exist. For example, the Newton-Raphson or Goldschmidt algorithms are popular.
As with other operations (such as addition, subtraction, or multiplication) which may be calculated using hardware designed to perform the operation directly (e.g. adders, multipliers, etc.), the approximation hardware is designed to meet the requirements of the widely accepted floating point standards. One such standard is the Institute for Electrical and Electronic Engineers (IEEE) standard 754 (and related standards and updates). The IEEE 754 standard specifies floating point arithmetic for several precisions, including single precision and double precision. Single precision numbers are represented by a 32 bit quantity which includes 1 bit of sign, 8 bits of exponent, and 23 bits of significand. An implied one to the left of the binary point brings the precision of the significand to 24 bits. Double precision numbers are represented by a 64 bit quantity which includes one bit of sign, 11 bits of exponent, and 52 bits of significand. Again, the implied one brings the precision of the significand to 53 bits.
The IEEE 754 specification requires that any floating point result be accurate (as compared to the exact mathematical result) within 0 to ½ of the least significant digit of the result. Results computed using approximation algorithms must meet this level of accuracy as well.
The approximation algorithms may often include subtractions which may themselves be approximated by inverting the value being subtracted (and a right or left shift may be necessary). By making use of this approximation, use of the adder in the floating point unit may be avoided. Additionally, the subtraction may be performed at the end of a multiplication which produces one of the operands for the subtraction (at least in some cases), thus reducing the number of iterations through the floating point hardware to perform the approximation. The subtraction approximation results in a value which is less than the actual result of the subtraction (the difference) by one unit in the last place. The error in the approximation may propagate, and may in some cases prevent the achievement of the 0 to ½ of the least significant bit of accuracy requirement of the IEEE 754 specification.