This invention relates to a method of converting a format for a binary floating point number of IEEE (Institute of Electrical and Electronic Engineers) standard 754 or one which is conformed thereto, and a device employing the same.
Recently, in accordance with complicated scientific application and graphic procedure in a computer, high speed, precise floating point operation is desired. Errors in results of floating point operations caused by the computer hardware construction are eliminated by following the IEEE Std 754.
In the IEEE Std 754, a format whose total bit number is 32, a 1-bit sign S, an 8-bit exponent E and a 23-bit fraction F, is specified for a single-precision floating point number. Usually, normalization is performed by adjusting a value of the exponent E so that a virtual non-zero value bit and a radix point are located upper than a most significant bit (MSB) of the fraction F. Wherein, a value obtained by adding 127 as a bias B.sub.s to an actual exponent is made the exponent E so that the exponent E is a positive value. Namely, an real number R.sub.1 expressed as a normalized number is as follows. EQU R.sub.1 =(-1).sup.S 2.sup.E-127 (1.F) (1)
Wherein, 1.F in the equation (1) is a mantissa M. However, when the normalization is performed even in case where a result is a neighborhood value of 0, the calculation precision lowers drastically. Therefore, in such a case, in the IEEE Std 754, the real value is expressed as an denormalized number. In other words, the exponent E is made 0, and the fraction F is shifted so that the weight of zero value bit 1-bit upper than the radix point is 2.sup.-126. In this case, the real value R.sub.2 expressed as the denormalized number is as follows. EQU R.sub.2 =(-1).sup.S 2.sup.-126 (0.F) (2)
In this case, the mantissa M is 0.F.
In the IEEE Std 754, as a format of 64 bits long double-precision floating point number, specified is a numerical representation composed of a 1-bit sign S, an 11-bit exponent E and a 52-bit fraction F. In this case, a real number R.sub.3 is as follows. EQU R.sub.3 =(-1).sup.S 2.sup.E-1023 (1.F) (3)
A value obtained by adding 1023 as a bias B.sub.d to a real exponent is made the exponent E. The mantissa M is 1.F.
In the IEEE Std 754, there is no specification for a format of a fixed point number. Usually employed is a format of integer type fixed point number having a most significant bit (MSB) expressing a sign so as to express the negative number by 2's complement and an integer part of predetermined bit length. In a 32-bit integer, the MSB is a sign bit, the other 31 bits compose the integer part, and the radix point is located lower than a least significant bit (LSB) of the integer part.
A floating point arithmetic instruction set includes instructions for various format conversions, so that a conversion between different formats are executed as required.
According to a comparison of the equation (1) and the equation (3), it is found that a range of the real value expressed by the normalized single-precision floating point number is narrower than that expressed by the double-precision floating point number. Therefore, there is a case where a format of the denormalized number according to the equation (2) must be employed for a format conversion from a double-precision floating point number to a single-precision floating point number. In a conventional computer, however, a hardware is optimized for operation of a normalized floating point number, so that when an operation result is a denormalized number, a procedure by the hardware is interrupted for regarding the case as an exception and a procedure of the denormalized number is entrusted to a software. In consequence, there rises a problem that the format conversion from the double-precision floating point number to the single-precision floating point number is performed with a less speed.
On the other hand, the number of procedures to be executed for obtaining a final conversion result is different according to an object of the format conversion, i.e., a value of operand. For example, in case where a format of an operand expressed as the double-precision floating point number is converted to the single-precision floating point number, in the IEEE Std 754, the number of significant bits (=24, wherein a virtual bit 1-bit upper than the radix point is included) of mantissa of single-precision floating point number obtained as above is always smaller than the number of significant bits (=53) of the operand, which requires a rounding procedure of the mantissa without failure. Nevertheless, there is a case correction to an exponent and a mantissa is required according to a carry caused by the rounding procedure. While, in case where a format of an operand expressed as a 32-bit integer is converted to the single-precision floating point number, the operand must be made to be an absolute value thereof when the operand is a negative value. Likewise, only when the operand is a positive value and the number of significant bits of the 31-bit length integer part of the operand is larger than 24, that of the single-precision floating point number, the rounding procedure is required. When a carry is caused by the rounding procedure, an exponent and a mantissa of the single-precision floating point number to be obtained must be corrected. However, conventionally, every procedure is executed for a format conversion instruction even in case where some procedures can be omitted, thus remaining conversion efficiency low.
The present invention has its object to execute a format conversion dealing with a floating point number with a high speed and high efficiency.