The present invention relates to a floating point arithmetic operation unit, and more particularly to a floating point arithmetic operation unit suitable for performing a round operation at a high speed.
A method for rounding a result of the floating point arithmetic operation is disclosed in U.S. Pat. No. 4,468,748. In order to round a normalized mantissa operation result (64 bits) to a single precision (24-bit mantissa) or a double precision (56-bit mantissa), guard bits (8 bits used to determine the round processing) following the least significant bit (LSB) of the mantissa in the single or double precision are checked to determine whether to round up (add one to the LSB) or round off. If a round off operation is determined, the mantissa in the single or double precision of the mantissa operation result (64 bits), is added to zero, and for the other lower portion, and AND function is performed with zero. When it is determined that the mantissa should be rounded up, a carry input applied to the LSB of a 64-bit mantissa operation unit is propagated in portion to an addition with zero (actually it means an AND function) is carried out, and added to the desired mantissa LSB.
In this method, the determination and operation of the rounding processes is simultaneously performed (actually, the carry input to the LSB is handled depending on round-up or round-off) by utilizing the fact that the carry is propagated when the AND function is performed in a 4-bit sliced operation unit. However, it is necessary to control the 64-bit operation unit by dividing it into three portions, high order 24 bits, low order 8 bits and middle order 32 bits. Further, in this method, the rounding processing in an IEEE standard double precision floating point format as shown in FIG. 1b cannot be performed well because a 53-bit mantissa data including one hidden bit of an integer portion is handled in the operation and the LSB appears to the MSB in the 4-bit sliced operation unit. As can be seen in FIG. 16, the IEEE standard double precision floating point format includes a fraction (f) in bits 0-51, an exponent (e) in bits 52-62 and a sign (s) in bit 63. Leading bit (1) is in bit 0. If the round-up operation is performed for the mantissa which has a maximum value before the rounding process, some time after the rounding process is performed an overflow of the mantissa occurs due to the rounding processing.
The IEEE standard format is described in COMPUTER March, 1981, pages 51-62.
The above referenced description includes Draft 8.0 of IEEE Task P754, where it is recommended to carry out the floating point arithmetic operation in an expanded precision floating point format shown in FIG. 1c. For the round processing, it is recommended to check three bits following the LSB, that is, guard bit, round bit and sticky bit (which is a logical OR function of all bits following the third bit from the LSB), and the LSB and a sign bit as shown in FIG. 3 and carry out the rounding processing in accordance with rules shown in FIGS. 4a-4d according to one of four round modes (RN, RP, RM and RZ) shown in FIGS. 4a-4d which a user can specify and to designated one of three precisions shown in FIGS. 1a-1c.
FIG. 2 shows a processing flow of a conventional floating point add/subtract operation in accordance with the IEEE standard. A difference between exponents of two operands is calculated (10), and a mantissa of an operand having a smaller exponent is shifted to the LSB (to the right) by the difference between the exponents, and the operands are added or subtracted (20). In the shift operation, a logical OR function of all bits shifted beyond the round bit should be reflected to the sticky bit. This bit indicates whether a value overflowed to the right by the shift is zero or not. The operation result in the step 20 is shifted to the left by the number of zero bits preceding to the MSB, and this number is subtracted from the exponent (30). The result in the step 30 is rounded (40). Finally, whether the characteristic overflowed or not by the round processing is checked, and if it overflows, the mantissa is shifted to the right by one bit, and one is added to the exponent (50).