The present invention relates to floating-point processors, and more particularly to floating-point processors with an operating mode that has improved accuracy and high performance.
In digital processing systems, numerical data is typically expressed using integer or floating-point representation. Floating-point representation is preferred in many applications because of its ability to express a wide range of values and its ease of manipulation for some specified operations. A floating-point representation typically includes three components: a sign bit (sign), a mantissa (mant) that is sometimes referred to as a significand, and an exponent (exp). The represented floating-point number can be expressed as (−1)sign·mant·2exp. Floating-point representations are also defined by “IEEE Standard for Binary Floating-Point Arithmetic,” which is referred to herein as the IEEE-754 standard (or simply, the IEEE standard) and incorporated herein by reference in its entirety for all purposes.
The IEEE standard defines representations for “normalized” and “denormalized” float-point numbers. A normalized floating-point number is characterized by a mantissa having a one to the left of the binary point and a 1.xxx--xx format, where each “x” represents one bit that is either a one or a zero. A denormalized floating-point number is characterized by a mantissa having a zero to the left of the binary point and a format of 0.xxx--xx. Floating-point numbers greater than or equal to a positive minimum representable normalized number (i.e., y≧+amin) and less than or equal to a negative minimum representable normalized number (i.e., y≦−amin) are represented using normalized numbers. Floating-point numbers in the range between the negative and positive minimum normalized numbers other than zero (i.e., −amin<y<+amin and y≠±0) may be represented using denormalized numbers. Zero is represented by a mantissa having a value of zero and an exponent also having a value of zero.
Many operations can be performed on floating-point numbers, including arithmetic operations such as addition, subtraction, multiplication, division, and reciprocation. For arithmetic operations, the IEEE standard provides guidelines to be followed to generate a unique answer for each floating-point operation. In particular, the IEEE standard describes the processing to be performed on the result from a particular operation (e.g., add, multiply), the precision of the resultant output, and the data format to be used. For example, the IEEE standard defines several rounding modes available for the results from add and multiply operations, and the bit position at which the rounding is to be performed. The requirements ensure identical results from different implementations of IEEE-compliant floating-point processors.
The rounding modes defined by the IEEE standard provide improved accuracy for some operations, but are complicated to implement and also increase the processing time for an arithmetic operation. To obtain an output that fulfills IEEE rounding requirements, post-processing of a preliminary result from an arithmetic operation is typically performed. The post-processing includes possible denormalization and rounding of the preliminary result in accordance with one of the rounding modes defined by the IEEE standard. Denormalization is performed on a number having an absolute value less than +amin(i.e., −amin<y<+amin) to place it in a proper format such that rounding can be performed at the bit location specified by the IEEE standard. The post-processing (or more specifically, the denormalization and rounding) typically leads to increased circuit complexity and increases processing time. The IEEE rounding modes are thus generally implemented in applications requiring high accuracy.
To reduce hardware complexity and improve processing time, many floating-point processors implement an operating mode in which numbers (e.g., preliminary results) within the range of negative and positive minimum normalized numbers (i.e., −amin<y<+amin) are set or flushed to zero. The “flush-to-zero” mode is simple to implement and only marginally increases the processing time. However, the flush-to-zero mode suffers a loss in accuracy since the mantissa is flushed to zero.
For many applications such as embedded processors, reduced cost and improved processing time are desirable. For these applications, an operating mode that is simple to implement, marginally increases the processing time (if at all), and has improved accuracy over the flush-to-zero mode is highly desirable.