1. Technical Field
The present invention relates to floating-point conversion and more particularly to a system and method for efficient and correct rounding for floating-point conversions.
2. Description of the Related Art
There are several ways to represent real numbers. For example on computers, fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, if four decimal digits are available, you could represent a number by 10.82, or 00.01. Another approach is to use rationals, and represent every number as the ratio of two integers.
Floating-point representation is the most common solution and basically represents real numbers in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456×102. In hexadecimal, the number 123.abc might be represented as 1.23abc×162.
Floating-point solves a number of representation problems, while fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision during multiplication or division.
Floating-point, on the other hand, employs a sort of “sliding window” of precision appropriate to the scale of the number. This easily permits the representation of both very large and very small numbers.
Correctly-rounding floating-point conversion from one radix to another needs an intermediate precision that is typically more than double the desired target precision, if a correctly-rounded result is needed. Until about ten years ago most programming environments only guaranteed a reasonably-bounded conversion error, e.g., one unit in the last place (one “ulp”), which can be achieved by a few simple multiplications in a target precision.
Correct rounding would be achieved by resorting to multiple-precision arithmetic, sometimes unconditionally, sometimes only for “difficult” numbers when an error analysis showed that the result was dangerously close to a rounding threshold. The cost of this extra mechanism would be several times that of a simple conversion, and would often require large amounts of scratch storage to hold extended-precision intermediate results.