Computing systems can perform extensive amount of calculations. Floating point hardware, such as a floating point unit (FPU), is a part of a computer system that is specifically designed to carry out operations on floating point numbers. Floating point refers to a representation of a number where a radix point (decimal point, or, more commonly in computers, binary point) can “float”; that is, can be placed anywhere relative to the significant bits of the number. This position is indicated separately in the internal representation, and floating-point representation can thus be thought of as a computer realization of scientific notation. Typical operations performed by floating point hardware on floating point numbers are addition, subtraction, multiplication, division, and square root. In most general purpose computer architectures, one or more FPUs are integrated with the processor.
When an operation is performed on floating point numbers, the result of the operation can result in zero, infinity, Not-A-Number (NaN), or a finite nonzero number. The destination of the operation, such as a register or memory to store the result of the operation, can have a limited number of bits. However, in the case the result is a finite nonzero number, the result can have a number of bits that is greater than the number of bits available in the destination of the results (e.g., a finite number that takes an infinite number of bits to describe it, such as ⅓). To address the case where the finite nonzero number has a number of bits that is greater than the number of bits available in the destination for the finite nonzero number, the floating point unit can perform a rounding operation on the finite nonzero number such that the rounded value of the finite nonzero number can fit in the destination.
The rounding process can calculate a rounded value of the finite nonzero number and status flags based on the rounded value. The format of the rounded value can be represented by a floating point value composed of a mantissa, an exponent value (which can be positive or negative), and a sign (positive or negative) of the floating point value. A mantissa (also known as a significand) is part of the rounded value that includes its significant bits, which includes at least one integer bit and a fraction part. The status can include three flags: a precision flag, an underflow flag, and an overflow flag. The precision flag can represent whether the rounded value is an inexact version (smaller or bigger than) of the finite nonzero number. The overflow flag can represent whether the exponent of the rounded value is too big to be represented by the number of available bits for the exponent. The underflow flag can represent whether the exponent of the rounded value is too small to be represented by the number of available bits for the exponent.
Multiple solutions have been utilized to perform the rounding process to generate a rounded value of the finite nonzero number and status flags associated with the rounded value. One approach calculates a rounded value by performing a first rounding of the finite nonzero number, generates the status flags based on the rounded value, and escapes to a microcode program or a user-level program when any of the status flags are asserted (such as set to the value 1). In this approach, input from the microcode program or user-level program is required to complete the rounding process and the rounding process can therefore slow down the computing system.
Another approach requires a two-pass rounder that performs a first rounding by calculating a first rounded value of the finite nonzero number and generating the status flags based on the first rounded value. If the status flags indicate that the first rounding was subject to overflow (rounded value too big) or underflow (rounded value too small), a second rounding can be performed by calculating a second rounded value of the finite nonzero number. The two-pass rounder can be inefficient because two rounding operations may be necessary. Moreover, the two-pass rounder can cause delay in the system because of the multiple rounding operations required.