This invention relates generally to arithmetic operations in digital computing equipment and in particular to an apparatus and method of performing precision rounding in digital floating point arithmetic.
As is known in the art, a floating point number comprises an exponent portion and a mantissa portion. The exponent portion represents the exponent to which a base number is raised and the actual number is interpreted to be the mantissa portion multiplied by the base number raised to a power specified by the exponent portion. Thus any particular digital number may be expressed as (M,E) where M is a n-digit signed mantissa and E is an m-digit signed integer exponent or M.times.B.sup.E where B represents the base number system which in many computer systems is the binary number system where B=2 and in other computer systems B=10 (decimal) or 16 (hexadecimal).
A nonzero floating point number is said to be normalized if its mantissa contains the maximum possible amount of significance. In other words, a normalized floating point number has a value other than "zero" as the left most significant digit in its mantissa. A normalization process for any floating point digital number comprises the steps of shifting the floating point to its proper position in the mantissa and changing the exponent so that the value of the combination of mantissa and exponent remains constant.
Floating point numbers may be added, subtracted, multiplied or divided. When addition or subtraction is to be performed, normalized floating point numbers with different exponents are processed by changing the exponents of the digital numbers to be processed until such exponents are the same and shifting the mantissas with respect to each other a corresponding amount (to maintain the values of the two numbers) and then adding, or subtracting if desired, the adjusted mantissas. The resulting sum or difference of the adjusted mantissa when combined with the adjusted exponent is the desired sum or difference, if none of the significant digits in the original mantissas are lost.
In a practical computer however, arithmetic operations get complicated by the fact that the mantissa portion of a number does not have infinite precision and it is normally processed in a register comprising a fixed number of digits referred to as "n" digits. Although two input operands to be added together may be considered to be exact, the result of the addition operation often creates more than n significant digits. The problem then is to squeeze an accurate representation of the sum into n digits by the processes of normalization and rounding.
If greater than n digits remain after normalization, older computers often either discard the remaining digits or truncate them before the addition takes place. In more recent computers, precision rounding requires that the computation must be performed as if all digits of the sum were retained. For example, if the remaining digits after the n'th digit represent less than 1/2 of the n'th digit in value, they are discarded. If more than 1/2 is represented, then one is added to the least significant digit of the n digits retained. If precisely 1/2 is represented then the rounding may go either way. In "balanced rounding" an attempt is made to round just half the time as proposed in IEEE Standard 754 for Binary Floating-Point Arithmetic. Barrel shifters with a wide OR gate have been used to perform precision rounding but require considerable hardware; alternatively, a flag bit known as a "sticky bit" has been used in machines implementing floating point with serial shift registers, which was used as a simple means of detecting when a "one" was shifted into it, but this is a slow operation.