Floating point units (FPU) in processors represent numbers by a mantissa, the fractional portion of the number, an exponent (or characteristic) and a sign bit indicative of whether the number is positive or negative. Within a floating point unit the mantissa may be represented in a variety of precision formats adopted by the Institute of Electrical and Electronic Engineers (IEEE) Standard 754, including single precision (23 bits), double precision (52 bits), extended real precision (64 bits), or in an internal precision format.
Rounding is necessary during and/or after performing floating point mathematical operations for several reasons. One reason is that certain mathematical operations produce results which exceed the processors's word length. For example, in an n-bit processor, multiplication of two n-bit numbers may result in a 2n-bit product. Rounding within the floating point unit (internal rounding) is used to drop the excess bits with a minimal loss of accuracy to intermediate results. Another reason for rounding is that memory is often assigned a less precise format than the floating point unit in order to conserve space. When storing the data to memory, the data must be converted from the more precise format of the floating point unit to the less precise format of memory. This conversion requires rounding (store rounding or external rounding).
Rounding may be performed according to several known modes established by IEEE Standard 754. Rounding action dictated by each of the modes, "Round to Nearest", "Round to +.infin.", "Round to -.infin." and "Round to Zero", depends upon the values of the least significant bit (LSB) and the three bits immediately below it designated as the guard bit (G), the round bit (R) and the sticky bit (S). The guard bit is the bit immediately below the least significant bit. The round bit is the bit immediately below the guard bit. The sticky bit is the logical OR of all bits below the round bit. For the "Round to +.infin." and "Round to -.infin." rounding modes the sign bit is also considered as further explained below. FIG. 1 details, according to the IEEE standard, the rounding action taken for each rounding mode.
Conventional methods of rounding require complicated random logic. This approach can be slow because of the number of delay stages required by the logic. Various prior art methods have been proposed to avoid or minimize the delays imposed by the decision-making circuitry. These methods generally involve skipping rounding or combining it with other steps. For example, U.S. Pat. No. 4,839,846, to Hirose, entitled "Apparatus for Performing Floating Point Arithmetic Operations and Rounding the Result Thereof", shows representative conventional random logic and teaches a floating point unit which combines the steps of rounding, normalization and overflow processing due to rounding. U.S. Pat. No. 4,562,553 to Mattedi, entitled "Floating Point Arithmetic System and Method with Rounding Anticipation", discloses a floating point system that includes a rounding circuit responsive to a carry circuit which anticipates whether rounding will be necessary; the rounding and arithmetic operation occur simultaneously. U.S. Pat. No. 4,941,120 to Brown, entitled "Floating Point Normalization and Rounding Prediction Circuit", discloses a rounding circuit that predicts when postnormalization and rounding can be skipped in order to enhance the efficiency of the floating point operations. Each of these prior art references relates primarily to internal rounding. In order to maintain the accuracy of the FPU, these prior art methods must maintain the integrity of the guard, round and sticky bits for further arithmetic operations.
U.S. Pat. No. 5,235,533 to Sweedler, entitled "Store Rounding in Floating Point Unit", which is incorporated by reference, discloses a normalization apparatus for converting to single precision or double precision an extended precision number comprised of a sign field, an exponent field and a mantissa field. The apparatus makes general reference to rounding logic, but does not disclose the operation of the rounding logic.
In pipelined processors each instruction is executed in part at each of a succession of stages. After the instruction has been processed at each of the stages, the execution is complete. This pipelining scheme permits multiple instructions to be performed in parallel, increasing the overall performance of the processor. Consistent with this scheme, it is desirable to perform floating point arithmetic operations simultaneously with stores to memory. More particularly, it is desirable to perform internal rounding within the floating point unit simultaneously with store rounding to memory outside the floating point unit. When storing to memory, speed is more important than maintaining the integrity of the guard, round and sticky bits, since these bits are usually not saved in memory. What is needed is a fast and efficient rounding circuit for store rounding.