1. Field of the Invention
The invention relates to the field of floating point numbers, and more particularly to processing of denormal floating point numbers in a digital computer system.
2. Art Background
An exemplary microprocessor, such as the Pentium.TM. brand processor which is a product of Intel.RTM. Corporation, Santa Clara, Calif., represents real numbers of the form (-1)s2E(b0.b1b2b3 . . . bp-1) where:
s=0 or 1 PA1 E=any integer between Emin and Emax, inclusive PA1 bi=0 or 1 PA1 p=number of bits of precision PA1 The biased floating-point exponent is stored at its smallest value. For single precision numbers, this minimum exponent value is -126. For double precision, the minimum exponent value is -1022. For the extended precision format, the minimum exponent value is -16382. For all formats, when the number is denormal the minimum exponent is encoded with a bit pattern of all zeros. PA1 The integer bit of the significand (whether explicit or implicit) is zero. PA1 The processor avoids creating denormals whenever possible. In other words, it always normalizes real numbers except in the case of tiny numbers. PA1 The processor provides the unmasked underflow exception to permit programmers to detect cases when denormals would be created. PA1 The processor provides the unmasked denormal operand exception to permit programmers to provide a tailored response in the presence of denormal operands. PA1 Detection of denormal numbers in the originating format. PA1 Generating an exception when the input operand is a denormal number and the denormal exception is unmasked. PA1 Normalization of the denormal number in the event that the input operand is a denormal number and the denormal exception is masked. PA1 Examination of the input operand to check whether it is encoded to have a special interpretation, such as a signaling Not-A-Number (NaN) encoding. If this is true, the FPU delivers an interrupt for the invalid operation exception when the invalid operation exception is unmasked.
Table 1a summarizes the parameters for each of the three real-number formats. The Pentium brand processor stores real numbers in three-field binary format that resembles scientific, or exponential notation. The significand field, b0b1b2b3 . . . bp-1, is the number's significant digits. (The term "significand" is analogous to the term "significand" used to describe floating-point numbers on some computers.) The exponent field, e+E=bias, locates the binary point within the significant digits (and therefore determines the number's magnitude). (The term "exponent" is analogous to the term "characteristic" used to describe floating-point numbers on some conventional computers.) A 1-bit sign field indicates whether the number is positive or negative. Negative numbers differ from positive numbers only in the sign bits of their significands.
TABLE 1a ______________________________________ Single Double Extended ______________________________________ Total Format Width 32 64 80 p (bits of precision) 23 53 64 Exponent bits 8 11 15 Emax +127 +1023 +16383 Emin -126 -1022 -16382 Exponent Bias +127 +1023 +16383 ______________________________________
The single real format is appropriate for applications that are constrained by memory, but it should be recognized that this format provides a smaller margin of safety. It is useful for the debugging of algorithms, because roundoff problems will manifest themselves more quickly in this format. It is often used in graphics applications as well. For most microcomputer applications over the last decade, the double real format has provided sufficient range and precision to return correct results with a minimum of programmer attention. Most processors have optimized their computational paths to provide the maximum performance on operations on the double real format. The extended real format was originally developed with an intent to hold intermediate results, loop accumulations, and constants. Its extra length was designed to shield final results from the effects of rounding and overflow/underflow in intermediate calculations.
As microprocessor performance increases (by talking advantage of the improvements in the technology of Very Large Scale Integration), applications develop that exploit this increase in performance to deliver more utility. These new applications operate on larger data sets and invoke more complex calculations that are more prone to roundoff errors. The extended format is useful in these applications, not just as an intermediate format, but also as a format for input and output operands. With the need to support the extended format as outlined above, future processors must now be designed to support computation on three real number floating point formats in their computational paths.
The floating point unit (FPU) of the processor usually retains floating point numbers in normalized form. This means that, except for the value zero, the significand contains an integer bit and fraction bits as follows : EQU 1.fff. . . ff
where "." indicates an assumed binary point. The number of fraction bits varies according to the real format: 23 for single, 52 for double, and 63 for extended real. By normalizing real numbers so that their integer bit is always a 1, the processor eliminates leading zeros in small values. This technique maximizes the number of significant digits that can be accommodated in a significand of a given width. Note that, in the single and double formats, the integer bit is implicit and is not actually stored in memory; the integer bit is physically present (explicit) in the extended format only.
If one were to examine only the significand with its assumed binary point, all normalized real numbers would have values greater than or equal to one and less than two. The exponent field locates the actual binary point in the significant digits. Just as in decimal scientific notation, a positive exponent has the effect of moving the binary point to the right, and a negative exponent effectively moves the binary point to the left, inserting leading zeros as necessary. An unbiased exponent of zero indicates that the position of the assumed binary point is also the position of the actual binary point.
The exponent field, then, determines a real number's magnitude. In order to simplify comparing real numbers (e.g., for sorting), the processor stores exponents in a biased form. This means that a constant, called a bias, is added to the true exponent described above. As Table 1a shows, the value of this bias is different for each real format. The bias is chosen so as to force the biased exponent to be a positive value. A number's true exponent can be determined simply by subtracting the bias value of its format. In the 80.times.86.TM. family of processors, a product of Intel Corporation, the single and double real formats exist in memory only. If a number in one of these formats is loaded into an FPU register, is automatically converted to extended format, the format used for all internal operations. Likewise, data in registers can be converted to single or double real for storage in memory.
When a numeric value becomes very close to zero, normalized floating point storage cannot be used to express the value A number R is said be tiny (also commonly referred to as subnormal) when -2.sup.Emin &lt;R&lt;0 or 0&lt;R&lt;+2.sup.Emin.(For a typical case, Emin is -126 for single format, -1022 for double format, an -16382 for extended format.) In other words, a nonzero number is tiny if its exponent would be too negative to store in the destination format, while retaining the number in normalized form.
To accommodate these instances, the processor can store and operate on real numbers that are not normalized, i.e., whose significands contain one or more leading zeros. Denormals arise when the result of a calculation yields a value that is tiny.
Denormal values have the following properties:
It is important to note that interpretation of the exponent encoding for denormal numbers differs from the interpretation of the exponent encoding for normalized numbers. For denormalized numbers, the exponent is encoded with the bit pattern of all zeros, although this pattern is interpreted to have a value which is the minimum exponent value (which is -126 for single format, -1022 for double real format, and -16382 for the extended real format). Hence, interpreting such denormal numbers by merely adding the bias of the format to the exponent encoding of the denormal number will produce an exponent value that is off by one. Denormals and true zeros both have exponents encoded with all zeros, although the interpretation of these encodings differ.
As a number becomes smaller, it gradually transitions from a normal representation to a denormal representation. Table 1b below illustrates this process for a single precision number.
TABLE 1b ______________________________________ Significand Value Significand Des- Exponent (includes explicit Encoding Encoding cription Value bit) Exponent (no explicit bit) of number ______________________________________ 1. 1.0000000 . . . 01 00000001 0000000 . . . 01 Smallest 0 .times. 3f81 single precision normal 2. 1.0000000 . . . 00 00000001 0000000 . . . 00 Largest 0 .times. 3f81 single precision denormal 3. 0.1111111 . . . 11 0000000 1111111 . . . 11 0 .times. 3f81 4. 0.1111111 . . . 10 0000000 1111111 . . . 10 Smallest 0 .times. 3f81 single 5. 0.0000000 . . . 01 0000000 0000000 . . . 01 True Zero 0 .times. 3f81 6. 0.0000000 . . . 00 0000000 0000000 . . . 00 0 .times. 0000 ______________________________________
Entry one in Table 1b shows a normal number which is very close to becoming denormal. Entry two shows the smallest possible normal number which can be stored in the single format. Entry three shows the denormal number which results when the normal number in entry two loses a value equal to one digit in the last place. The exponent of the number is encoded as zero, although its value remains at the minimum exponent for a single precision number. The significand bits are set to all ones. Entry five shows the smallest denormal number which can be represented by the single precision format.
Denormals typically receive special treatment by processors in three respects:
Denormalizing means incrementing the true result's exponent by a certain amount, and inserting a corresponding number of leading zeros in the significand, shifting the rest of the significand by the same amount to the right. The denormalization process causes loss of precision if significant low-order bits are shifted off the right end of the significand field. In a severe case, all the significant bits of the true results are shifted off and replaced by the leading zeros. In this case, the result of denormalization yields a zero. Clearly, a significant amount of processing is required to handle denormal numbers in a computer system. When applications generate a large number of denormals, they can often tolerate a loss of precision while benefitting from increased performance due to faster denormal processing. For these applications, it is advantageous from a performance standpoint if the processing of denormal numbers is made faster, even at the cost of some loss in precision.
Typical prior art implementations map floating point data loaded from memory to the FPU from the originating format in memory to the extended format in the FPU registers. This mapping in the prior art has entailed, on a load instruction, a full conversion of the data from the originating format into the extended precision format. Likewise, on the store instruction, this has entailed a complete conversion of the data from the extended precision format (in the FPU register file) to the destination format of the result in memory.
In the prior art, conversion of the data on the load instruction typically includes the following:
As can be seen from the above, implementations on the prior art must examine the input operand being loaded in order to determine whether there will be any exceptions. Exceptions include the denormal operand exception response and the invalid operation exception response. Thus, implementations on the prior art incur data-related exceptions upon the loading of floating point operands.
In the event that the denormal operand exception is masked, prior art processors must normalize input operands which are denonmal numbers. This normalization operation requires, among other circuits, a hardware shifter. Modem processors typically execute several load instructions in parallel, requiring potentially several dedicated shifters on the chip. Since dedicated hardware for multiple shifters is expensive in terms of silicon die cost, some implementations may use schemes by which several load paths to the FPU arbitrate for a single shifter, adding design complexity. A more common alternative is to complete the normalization process by invoking on-chip microcode. Upon determining that the data being loaded is denormal, a micro-exception delivers control to a microcode handler. The microcode handler uses existing shifters in the FPU (shifters necessary for supporting the floating point add operation, for example) to execute the normalization of the loaded operand. Thus, implementations on the prior art need to provide either dedicated shifters for each load path, added design complexity to arbitrate for a single shifter, or take a micro-exception to enable microcode to complete the normalization.
Modem pipelines processors employ techniques that include deep pipelining, as well as parallel instruction execution. These modem processors execute several instructions concurrently at each stage of the pipeline. Typically, a load operation on these processors takes several cycles to complete its execution. A common prior art technique is to enable execution of instructions following a load instruction even before the loaded data is returned, as long as the subsequent instructions do not depend upon the loaded data. To complete the execution of these subsequent instructions, and to update the architectural state of the FPU with the result of these instructions, it is important to determine that there are no exceptions or micro-exceptions on all prior instructions.
Because the FPU takes several cycles to complete a load instruction, and because the load instruction may incur data-related exceptions or micro-exceptions, it is necessary to temporarily retain the execution results of instructions following the load instruction in a buffer (sometimes called a retirement buffer). The results are retained until such time as any data-related exceptions or micro-exceptions incurred by the load instruction are determined. Because it takes several cycles to make this determination, and because modern processors execute several instructions in parallel, the number of instructions subsequent to the load that will execute before the determination is made may be very large. A very large retirement buffer is then required to store these pending results. The size of this buffer poses an appreciable cost both in terms of silicon die cost as well as design complexity.
Especially with applications that generate a large number of denormals, it would be advantageous from a performance and hardware complexity standpoint to eliminate the need to take denormal exceptions on operand load, or during operand execution, even at the cost of some precision.
Some specialized numerical applications generate a great many denormals. Denormals are likely to arise when an application generates a great many intermediate computational values. In some parallel processing applications, a computational task is often divided into subtasks to execute on multiple processors in parallel. If one subtask produces a disproportionate quantity of denormals, and the other subtasks depend upon its timely completion, the subtask with more denormals may become a bottleneck to the completion of the overall task at hand. Especially in these applications, improvements in the speed of the handling of denormals by the processor can produce substantial performance benefits.