1. Field of Invention
This invention relates generally to a method for processing floating-point numbers. More specifically, but not by way of limitation, it is directed to a technique for representing floating-point numbers in a memory register format and floating-point register format in a manner that allows the handling of overflow and underflow floating-point numbers, thereby eliminating denormalization in memory and exponent modification from rounding.
2. Related Art
In an effort to unify methods employed in computer systems for performing binary floating-point arithmetic, the IEEE in the early 1980's standardized computer floating-point numbers. Such binary floating-point numbers make possible the manipulation of large as well as small numbers with great precision, and thus are often used in scientific calculations. They typically comprise either single precision format or double precision format, with single precision operating on 32-bit operands and double precision operating on 64-bit operands. Both single and double precision numbers constitute a bit-string characterized by three fields: a single sign bit, several exponent bits, and several fraction or mantissa bits, with the sign bit being the most significant bit, the exponent bits being the next most significant, and the mantissa bits being the least significant.
A normalized nonzero number X in the single format has the form EQU X=(-1).sup.S *2.sup.E-127 *(1.F)where
S=sign bit PA1 E=8-bit exponent biased by 127 PA1 F=X's 23-bit fraction which, together with an implicit leading 1, yields the significant digit field "1. - - - " PA1 S=sign bit PA1 L=Leading bit, or (0) PA1 F =Fraction, or nonzero PA1 1. A method and apparatus for handling underflow in floating-point numbers that is straightforward and efficient to implement in hardware. PA1 2. A method and apparatus for handling underflow in floating-point numbers that is more efficient than IEEE standards. PA1 3. A method and apparatus for simplifying rounding of floating-point numbers that does not generate a denormalized or infinite number. PA1 4. A method and apparatus for simplifying rounding of floating-point numbers that eliminates carry propagation and exponent adjustment in rounding. PA1 5. A method and apparatus for handling underflow in floating-point numbers that simplifies real-time software design and validation. PA1 6. A method and apparatus for representing overflow that provides a symmetrical situation regarding the exponent range. PA1 7. A method and apparatus for handling floating-point numbers that gives the ability to provide accuracy indication.
In the conventional floating-point representation, the boundary between the exponent and mantissa parts is fixed, resulting in a constant number of bits representing the exponent. As such, the range of values that can be represented is limited. Even if there are unused exponential or mantissa bits, such free bits cannot be used for other purposes. Therefore, conventional floating-point number representation is not flexible enough to utilize the unused space of the exponent part thereby improving the precision of the mantissa, or to utilize any unused space of the mantissa part thereby improving the precision of the exponent.
In other floating-point representations, the size of the exponent or mantissa is a variable length. This variable length allows for underflow representation of numbers very close to zero, and for overflow representation of very large numbers which would normally produce an infinite number. IEEE floating-point standards have a feature called gradual underflow which handles numbers close to zero. The format employed is termed denormalized numbers and is difficult to implement in hardware. In some cases, such as in rounding, the adjustment of the exponent to handle underflow and overflow numbers creates a denormalized floating-point number. Denormalized numbers require additional processing which conflicts with the efficient processing of the vast remaining majority of floating-point numbers.
According to IEEE Standard 754, a denormalized number is a nonzero floating-point number whose exponent has a reserved value, usually the format's minimum, and whose explicit or implicit leading significant bit is zero. As such, its represented number is EQU X=(-1).sup.S *2.sup.-126 *(L.F) where
Two correlated events contribute to underflow. One is the creation of a tiny nonzero result between .+-.2.sup.Min E which, because it is so tiny, may later cause some other exception such as overflow upon division. The other event is the extraordinary loss of accuracy during the approximation of such tiny numbers by denormalized numbers. Loss of accuracy may be detected as either a denormalization loss--when the delivered result differs from what would have been computed were exponent range unbounded--or, an inexact result--when the delivered result differs from what would have been computed were both exponent range and precision unbounded. The IEEE Standard 754 does not track accuracy other than to require single and double precision.
For digital computer processors, the complexities of the IEEE Standard 754 increase the design and manufacturing cost of the processor. Additional hardware and processing time is required to handle denormalized numbers in order to maintain the integrity of the data. For some computer applications such as real-time signal processing, it is desirable to have a fixed time duration for computations. The additional requirement and cost of seldom used circuitry to handle underflow and overflow numbers generates variations in compute time which, in turn, create difficulty in design and validation of the computer system. At the microprocessor level, it is desirable that operations take fixed-time durations and that seldom used circuitry be avoided.
In the current era of RISC microprocessors with pipelined floating-point data, there are specific floating-point registers and a dedicated floating-point unit. Data movement, both to and from memory, is handled by floating-point load and store instructions. This presents the opportunity for the memory floating-point format to be optimized separately from the register floating-point format. Memory format optimization involves information density, i.e., providing the greatest accuracy in a specific memory element. Register format optimization involves supporting the most efficient floating-point arithmetic unit.
The present invention has been designed to provide a method to efficiently handle such overflow and underflow numbers. The present invention provides a means whereby rounding of the result does not adjust the exponent value. It also supplies a means wherein memory data has precision encoded with a loss of one bit of mantissa. In addition, the present invention provides a means whereby both floating-point memory and register data are always normalized, such normalization accomplished using minimal hardware in the load/store paths. The present invention also allows conditional actions to be performed in hardware in parallel thereby minimizing processing duration and avoiding the need for seldom used circuitry. In reshaping the IEEE format, the present invention retains most of its benefits while offering the microprocessor designer and the real-time programmer a much cleaner and more efficient implementation.