1. Field of the Invention
The present invention relates to performing certain floating point arithmetic operations in a processor. More particularly, the invention relates to simplifying the completion of floating point arithmetic operations on two pre-normalized operands by performing in parallel the steps of normalizing and rounding the arithmetic result.
2. Description of the Related Art
Within a processor, a given number may be stored in a format known as floating point. The operations of multiply, divide, add, and subtract may be performed on floating point numbers. An American national standard has been developed in order to provide a uniform system of rules for governing the implementation of floating point arithmetic systems. This standard is identified as ANSI/IEEE Standard No. 754-1985, and is incorporated by reference in this application. In the design of floating point arithmetic systems and algorithms, it is a principal objective to achieve results that are consistent with this standard and enable users of such systems and algorithms to achieve conformity in the calculations and solutions to problems even though the problems are solved using different computer systems.
The typical floating point arithmetic operation may be accomplished in single precision, double precision, or extended precision format. Each of these formats utilizes a sign, exponent, and fraction field, where the respective fields occupy predefined portions of the floating point number. In addition, the extended precision format includes a mantissa field, which includes the fraction field plus an additional bit, the L bit, that is merely implied in the single - and double-precision formats.
FIG. 1 illustrates the IEEE format for a 32-bit single precision number where the sign field is a single bit occupying the most significant bit position; the exponent field is an 8-bit quantity occupying the next-most significant bit positions; and the fraction field occupies the least significant 23 bit positions. In the case of a double precision floating point number, the sign field is a single bit occupying the most significant bit position; the exponent field is an 11-bit field occupying the next-most significant bit positions; and the fraction field is a 52-bit field occupying the least significant position. The format of the extended precision floating point number requires a single sign bit, a 15 bit exponent field, and a 64-bit mantissa field that includes the fraction and the L bit.
In designing the hardware and logic for performing floating point arithmetic operations in conformance with the ANSI/IEEE standard 754-1985, it is necessary to incorporate certain additional indicator bits into the floating point hardware operations. For example, an "implicit" bit I is created by control logic when the exponent of the floating point number has a nonzero value. This bit, also called the "L" bit, can be created at the time a floating point number is written into the arithmetic registers where the implicit bit occupies the first bit position to the left of the fraction field of the number. Since, for non-zero exponents, the L bit is always one, it is "implied" and is not explicitly represented in the IEEE representation for single and double precision floating point numbers.
The L bit is represented internally within the floating point unit of the processor. The L bit is included, along with the fraction, in the mantissa of the internal representation of floating point numbers. While the L bit is only implied for IEEE single - and double-precision formats, the L bit explicitly is represented in registers containing extended precision floating point numbers.
An additional indicator bit, a "guard" bit G, is set by the floating point logic during certain arithmetic operations as an indicator of the loss of precision of the floating point number being processed. In the case of addition and subtraction, the G bit is set when a right shift, required for alignment, shifts a significant bit off the right side of the register capacity.
Additional indicator bits, a "round" bit R and a "carry" bit C, are similarly used for certain floating point operations and are set by the floating point logic. Finally, a "sticky" bit S is an indicator bit that is set in certain floating point arithmetic operations when any lower precision bit is a "1" as an indicator that the floating point number has lost some precision. In the standard prior art systems, the G, R and S bits are used exclusively for rounding operations, after the result has been normalized. The G and R bits are treated as if they are a part of the fraction; they are shifted with the rest of the fraction, and included in all arithmetic operations. The S bit is not shifted with the fraction but is included in the arithmetic. It acts as a "catcher" for 1's shifted off the right of the fraction. When a 1 is shifted off the right side of the fraction, the S bit will remain set until normalization and rounding are finished.
Floating point arithmetic operations require round logic well-known in the art to create a round control bit or signal indicating whether or not rounding is required. If none of the G, R, and S bits are set to a binary "1", no rounding will be required and the round control bit will not be set. Otherwise, the round control bit will be set or reset as required by the full set of round logic inputs.
After each floating point intermediate arithmetic result is developed, it must be normalized and then rounded if the round control bit is set. In the prior art, floating point units generally perform normalization and rounding functions in series. First the fraction portion of the unnormalized intermediate result of a floating point arithmetic operation is passed to a normalizer circuit where normalization is performed. Then, after normalization, rounding is performed.
In the prior art, normalization refers to the process of manipulating the exponent and fraction of an unnormalized intermediate floating point result so that the most significant binary "1" of the mantissa resides in the L bit, which is the most significant bit of the mantissa. Bit L is labeled as 24 in FIGS. 1 and 6. The exponent is decremented for each 1-bit left-shift of the mantissa. During normalization, the G and R bits are also shifted, with zeros shifted into the round bit. A single precision example of prior art normalization is shown below in Table 1, where variables W,X,Y, and Z represent any value. N represents any exponent value greater than or equal to Emin+1, where Emin is the minimum exponent capable of representation in the floating point unit.
TABLE 1 EXP L REMAINING BITS OF FRACTION G R S N 0 1XXXXXXXXXXXXXXXXXXXXXX W Y Z N - 1 1 XXXXXXXXXXXXXXXXXXXXXXW Y Z 0
In Table 1, the top fraction has the most significant binary "1" of the fraction residing one bit to the right of the L bit. The top fraction thus represents a value that requires a 1-bit shift left in order to be normalized according to the IEEE standard. The bottom fraction of Table 1 shows a normalized fraction with the most significant binary "1" of the fraction shifted into the L bit. After the one-bit left shift the top fraction of Table 1 is in the IEEE normalized format. During normalization, the exponent of the top fraction is decremented by one for each one-bit left shift. The top and bottom mantissas of Table 1 are equivalent if the value of the bottom floating point number's exponent is one less than the value of the top number's exponent
Rounding is then performed in the prior art on the normalized intermediate fraction. Rounding is performed by incrementing the normalized intermediate result if required. Since normalization and rounding are performed sequentially in the prior art, the latency of the execution pipeline includes the delay of both the normalizer and rounder circuits.
A method and apparatus that allows the normalization and rounding functions to operate in parallel eliminates the delay of the rounder from the execution pipeline. U.S. Pat. No. 4,926,370 to Brown et al (Brown '370) describes an implementation for performing the normalization and rounding functions in parallel. Brown '370 incorporates another patent, U.S. Pat. No.4,941,920 to Brown et al (Brown '120). The prior art performs parallel normalization and rounding in the following manner.
The first two bits of the mantissa of the intermediate result of a floating point arithmetic operation are examined to determine the format of the mantissa. Also, a check is made to determine what type of arithmetic operation has been performed. Under certain format/operation combinations the intermediate mantissa is loaded directly into a register shown as register 16 on FIG. 2. Under certain other conditions the first of two potential shifts is performed on the intermediate mantissa before it is loaded into the register 16. The present invention does not have this first shift.
From register 16 (of FIG. 2) the intermediate mantissa is simultaneously passed to a normalizer circuit and a rounder circuit. The first two bit positions of the intermediate mantissa residing in register 16 are examined. If the 2-bit format of the first two mantissa bits in register 16 is 1.X, the rounder circuit is activated. If the 2-bit format is neither 1.X nor 0.1X, the normalizer circuit is activated. If the 2-bit format is 0.1X the intermediate result mantissa is shifted left one position and the rounder circuit is then activated. The present invention does not have this second shift.
Therefore, the prior art requires two additional multiplexers in order to accomplish the preliminary normalization and rounding shifts that may be required. This preliminary shifting requirement and its attendant multiplexers are not necessary in the present invention.
The present invention reduces the latency of the floating point execution pipeline by allowing the normalization and rounding functions to be performed in parallel, eliminating the delay of the rounder from the total execution pipeline delay. The present invention also presents an improvement over the prior art because it does not require shifting of the intermediate mantissa prior to normalization and rounding. Instead, the intermediate result mantissa of an extended precision floating point arithmetic operation (or the intermediate result fraction of a single or double precision floating point arithmetic operation) is transferred directly into a register 610 (of FIG. 6), without a check of the operation type or first two bits of the intermediate result mantissa or fraction. The present invention transfers the intermediate mantissa or fraction directly into the register 610 without any intermediate shifting, eliminating the first additional multiplexer 15 (of FIG. 2) present in the prior art. The rounder circuit and the normalizer circuit of the present invention receive the intermediate mantissa or fraction from the register 610 in parallel. The present invention does not require a left-shift prior to rounding for an intermediate mantissa beginning with a 2-bit format of 0.1X, thus eliminating the need for the second additional multiplexer 53 of the prior art. The present invention performs a novel type of normalization. The normalizer circuit of the present invention performs normalization of the intermediate mantissa or fraction by 1) pre-incrementing the exponent by 1,2) shifting of the most significant binary "1" of the mantissa into the C bit rather than the L bit, and 3) decrementing the exponent in accordance with the number of left shifts performed for normalization. The present invention then selects and formats the correct result mantissa or fraction.