The demand for ever-faster computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. Microprocessor speeds have been increased in a number of different ways, including increasing the speed of the clock that drives the processor, reducing the number of clock cycles required to perform a given instruction, implementing pipeline architectures, and increasing the efficiency at which internal operations are performed. This last approach usually involves reducing the number of steps required to perform an internal operation.
Efficiency is particularly important in mathematical calculations, particularly floating point calculations that are performed by a data coprocessor. The relative throughput of a processor (i.e., integer unit pipeline) that drives a coprocessor (i.e., floating point unit pipeline) may change drastically depending on the program being executed.
In floating point representation, every number may be represented by a significand (or mantissa) field, a sign bit, and an exponent field. Although the size of these fields may vary, the ANSI/IEEE standard 754-1985 (IEEE-754) defines the most commonly used floating point notation and forms the basis for floating point units (FPUs) in x86 type processors. The IEEE-754 standard includes a signal precision format, a single extended precision format, a double precision format, and a double extended precision format. Single precision format comprises 32 bits: a sign bit, 8 exponent bits, and 23 significand bits. Single extended precision format comprises 44 bits: a sign bit, 11 exponent bits, and 32 significand bits. Double precision format comprises 64 bits: a sign bit, 11 exponent bits, and 52 significand bits. Double extended precision format comprises 80 bits: a sign bit, 15 exponent bits, and 64 significand bits.
It can be advantageous in a load-store implementation of IEEE-754 to represent all numeric values contained in the register files in the floating point unit as properly rounded values. Complete implementations of the IEEE-754 floating-point standard must perform rounding and status generation for all possible results, including tiny (denormal) results. The base number for IEEE floating-point standards is understood to be binary. A “normal” floating-point number is one which begins with the first non-zero digit in front of the binary “decimal” point and a denormal number is one that begins with the first non-zero digit after the decimal point. The accuracy or precision of the number is determined by the number of digits after the decimal point.
Data processors typically manipulate numbers in binary format. When operating in floating-point binary format, a microprocessor expects a normal floating-point binary number. As noted above, the normal floating-point binary number in the IEEE-754 format is understood to have an exponent greater than zero, a mantissa that begins with a 1, followed by the binary point, followed by subsequent binary ones (1s) and zeroes (0s). Thus, the characterization of the mathematical result as denormal (i.e., very tiny) is a function of the exponent being zero (0) and the mantissa begining with a 0, followed by subsequent binary ones (1s) and zeros (0s).
Unfortunately, denormal results may cause unique problems in a pipelined floating point unit (FPU). A conventional FPU execution pipeline typically comprises an operand stage, which retrieves operands from the register files of a register array and receives FPU opcodes from a dispatch unit. The FPU execution pipeline typically also comprises an exponent align stage, a multiply stage, an add stage, a normalize stage, and a round stage. The last stage of a conventional FPU execution pipeline is typically a writeback stage that writes results back to the register files in the register array or to a data cache.
In most applications, denormal results occur very rarely. Conventional (i.e., prior art) data processors frequently handle denormal results using microcode or software exceptions. However, in a pipelined floating point unit (FPU), no assumptions are made about the frequency of denormal results. Thus, every instruction that enters the FPU pipeline is operated on by every FPU stage. This includes the round stage after the normalize stage. Performing a conventional rounding operation on a denormal number gives an erroneous result.
One way to correct this problem would be to halt and flush out the entire execution pipeline, reload the instruction that caused the denormal result a second time, and disable the normalize stage the second time the instruction goes through. The other flushed instructions are then reloaded and processing continues. This approach greatly reduces performance, especially if a particular application generates an abnormally large number of denormal results.
Another way to correct this problem would be add an additional hardware stage to correct the error caused by the round stage, or to disable the round stage when a denormal result is detected in the normalize stage. This approach also reduces performance because every instruction must be processed by the additional stage, even though the vast majority of instructions in most applications do not produce denormal results. This approach also increases the size and power consumption of the FPU execution pipeline.
Thus, the processing of tiny numbers introduces delays in the associated pipelines and may even require additional stages and chip area to accommodate the tiny result processing requirements. In effect, all additions and multiplications are penalized by handling frequent tiny results.
Therefore, there is a need in the art for improved microprocessor architectures capable of handling denormal results more efficiently. In particular, there is a need for improved microprocessor architectures containing pipelined floating point units that are capable of handling denormal results efficiently without requiring complex rounding units in each pipeline to handle the rounding of denormal numbers.