1. Field of the Invention
This invention relates to computer systems, and more particularly, to finding an efficient method to achieve correct rounding for computer arithmetic.
2. Description of the Relevant Art
A computer system may comprise multiple processor cores wherein each core may have a floating-point unit to perform these arithmetic operations. The arithmetic operations may include addition, subtraction, multiplication, division and square root. The rounded result is represented by the computer system with a maximum limit of significance. Each processor core uses a finite number of bits to represent a floating-point numeric value. The finite number of bits used by a processor core is referred to as the processor core's precision. In addition, the accuracy of a floating-point value is referred to how close the processor core's representation of a numeric value is to an infinitely precise representation. It is desired to have the processor representation of the rounded result be as accurate as possible. Furthermore, a processor core may be configured to perform the floating-point arithmetic operations in more than one precision (e.g. single-precision, double-precision, or extended-precision).
A floating-point number is represented in a base number system that defines the radix of the system. For example, the decimal system with base 10 is a common base system. Modern computers use a binary system with base 2. Each base number system has a maximum number of digits that may be used to represent a number. For example, the decimal system uses ten digits, 0-9, and the hexadecimal system uses sixteen digits, 0-9 and a-f. As used herein, for simplicity sake, digits may refer to the digits of any base number system, although digits for a binary system are referred to as bits and digits for a hexadecimal system are referred to as hexadecimal digits, and so forth. Besides the base, three other entities are used to represent a floating-point number. First, the sign is a string used to represent the plus or minus sign. Second, a mantissa is a string of digits used to represent the number. The mantissa is a signed entity meaning it represents a positive or a negative number. Third, an exponent is used to record the position of the most significant digits, or the first non-zero digits, of the mantissa. The value of the floating-point number is found by multiplying the sign and mantissa by the base raised to a power set by the exponent. The floating-point number is referred to as normalized if its mantissa is zero for zero values, or, for non-zero values, its mantissa has a non-zero value in the left-most significant digit of the mantissa. For non-zero values, a non-normalized floating-point number may be normalized by, first, shifting the floating point until the left-most significant digit of the mantissa is non-zero, and, second, adjusting the exponent in order that the floating-point number represented by the above combination of mantissa, base and exponent, remains constant.
A floating-point number represented in a processor does not have an infinite number of digits in the mantissa. A register may hold the value of the normalized mantissa and it is limited to a certain number of memory storage locations, or bits. The number of bits, p, explicitly or implicitly used by a processor to represent the mantissa is referred to as the precision. The result of an arithmetic operation may require more than p bits for the representation of their respective mantissas. Therefore, it is required to find an accurate representation of such mantissas with only p bits.
Older processors truncated the extra bits beyond the most significant p bits. Modern processors perform rounding to obtain a more precise representation. For example, when rounding to the nearest machine representable number is desired, a value of one may be added to the least significant digit of the p digits of a mantissa if the digits following the p most significant digits contain a value more than one-half of the least significant digit of the p digits. When the value is less than one-half, the digits following the p most significant digits are simply truncated. When the value is equal to one-half, the action taken depends on the rounding technique being used. A common standard used for both floating-point number representation and rounding is the IEEE Standard 754 for Binary Floating-Point Arithmetic. Also, a computing system has a limit to the smallest increment or decrement of a floating-point number representation which is referred to as the unit in the last place (ulp).
Rounding methods, which may include one of a variety of algorithms, are used after the arithmetic operation is completed. Table-lookup(s) may be used to aid or complete the operation. One variable for an algorithm used in a rounding method may be the size of a table-lookup. As the size of a table-lookup increases, the accuracy of the result computed at intermediate steps increases, the number of subsequent computations decreases, but also, the die-area requirement for the table-lookup increases. An uncompressed table-lookup with a precision of 13 bits may require only half the area of a 14 bit table. However, more subsequent computations may be required due to the less accuracy of the 13-bit table-lookup. The rounding method may have conditions for the previous operations to complete prior to the use of the rounding method. For example, for division of two operands a and b, prior conditions may include the number of quotients to find (e.g., 1/b, a/b, or both) and the precision or accuracy of the quotients. Afterwards, a number of steps need to be taken to round the result of the calculation, and the number of steps may differ depending on the rounding method chosen.
Two examples of current rounding methods include the FMAC-based method, see P. Markstein, IA-64 and Elementary Functions: Speed and Precision, Hewlett-Packard®/Prentice-Hall, 2000, and the rounding used in floating-point units of AMD's K-8 microprocessors, see S. Oberman, Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor, Proceedings of the 14th IEEE Symposium on Computer Arithmetic, April 1999, pp. 106-115. The method described by Markstein requires the calculation of two quotients in parallel followed by a floating-point multiply accumulate unit (FMAC) operation for the remainder. Although this method's hardware requirements are a FMAC unit and a state machine, it requires two FMAC latencies to determine the rounded result. The K-8 method uses extra precision bits for internal calculations that are unseen to the user. The extra precision bits allows the internal calculations to have smaller bounded errors and only one remainder calculation needs to be performed. However, much extra hardware may be required for the extra precision bits such as a larger table-lookup or more die area for the multiplier circuitry.
In view of the above, an efficient method for floating-point rounding is desired.