1. Field of the Invention
The present invention relates to improving the performance of floating point execution units in a processor. More particularly, the present invention provides a method and apparatus that produces control signals for the normalizer in parallel with the addition operation, thus reducing the latency of the floating point execution pipeline by eliminating the delay of the leading zero/one detector.
2. Description of the Related Art
Within a processor, arithmetic operations may be performed on operands stored in a format known as floating point. An American national standard has been developed in order to provide a uniform system of rules for governing the implementation of floating point arithmetic systems. This standard is identified as ANSI/IEEE Standard No. 754-1985, and is incorporated by reference in this application. As discussed in further detail below, ANSI/IEEE 754-1985 includes rules for representing and storing floating point operands, rules for manipulating them to perform arithmetic operations, and rules for rounding and expressing the result of the arithmetic operations(s).
According to the standard, the typical floating point arithmetic operation may be accomplished in single precision, double precision, or extended precision format. Each of these formats utilizes a sign, exponent, and fraction field, where the respective fields occupy predefined portions of the floating point number. In addition, the extended precision format includes a mantissa field, which includes the fraction field plus an additional bit, the L bit, that is merely implied in the single- and double-precision formats.
The L bit is created by control logic when the exponent of the floating point number has a nonzero value. The L bit is written into the arithmetic registers in first bit position to the left of the fraction field of floating point numbers expressed in the extended precision format. For single- and double precision floating point numbers that have non-zero exponents, the L bit is not explicitly represented in the IEEE representation, but rather, is understood by the control logic to be present and to have a value of 1.
FIG. 1 illustrates the IEEE format for a 32-bit single precision number where the sign field is a single bit occupying the most significant bit position; the exponent field is an 8-bit quantity occupying the next-most significant bit positions; and the fraction field occupies the least significant 23 bit positions. In the case of a double precision floating point number, the sign field is a single bit occupying the most significant bit position; the exponent field is an 11-bit field occupying the next-most significant bit positions; and the fraction field is a 52-bit field occupying the least significant position. The format of the extended precision floating point number requires a single sign bit, a 15 bit exponent field, and a 64-bit mantissa field that includes the fraction and the L bit.
After each floating point intermediate arithmetic result is developed, it must be normalized and rounded if a round control bit is set. Normalization refers to the process of manipulating the exponent and fraction of an unnormalized intermediate floating point result so that the most significant binary xe2x80x9c1xe2x80x9d of the mantissa resides in the L bit, which is the most significant bit of the mantissa. The exponent is decremented for each 1-bit left-shift of the mantissa.
To implement the rounding rules required by ANSI/IEEE standard 754-1985, certain additional indicator bits may be set by the floating point logic during arithmetic operations. These bits generally indicate a loss of precision of a floating point number, such as might occur when an operand is right-shifted to align it for addition and one or more bits are shifted off the right side of the register. These lost precision bits are known as the xe2x80x9cguardxe2x80x9d bit G, a xe2x80x9croundxe2x80x9d bit R, and a xe2x80x9cstickyxe2x80x9d bit S. The G and R bits are treated as if they are a part of the fraction; they are shifted with the rest of the fraction during alignment and during normalization, and they are included in all arithmetic operations. The S bit is not shifted with the fraction but is included in the arithmetic. It acts as a xe2x80x9ccatcherxe2x80x9d for 1""s shifted off the right of the fraction. When a 1 is shifted off the right side of the fraction, the S bit will remain set until normalization and rounding are finished. Setting, interpreting, and using the G, R, and S bits to create a round control bit or a signal indicating whether or not rounding is required is well known in the art.
A typical floating point addition unit 10 is shown in FIG. 2. In the FIG. 2 addition unit 10, one of two input operands A and B may first be shifted in the Aligner 16, and then added together in the Adder 18 to produce an unnormalized intermediate result (A+B). This intermediate result is then passed to a leading zero/one detector (LZD) 20, which produces shift control signals for the normalizer 22. The normalizer 22 produces a normalized intermediate result by shifting the unnormalized mantissa result left by an amount specified by the LZD shift control signals. The exponent is decremented by one for each bit position that the mantissa is shifted to the left until the most significant bit position of the mantissa (the leading bit) becomes a one. The rounder 24 increments the normalized intermediate result, which is then typically passed to a multiplexer 26, where either the incremented result or the non-incremented result is selected to produce the final result, depending upon the ANSI/IEEE standard 754-1985 rounding scheme appropriate for the operation.
In conventional floating point addition units, as shown in FIG. 2, the arithmetic operation, leading zero/one detection, and normalization have usually been performed sequentially. This causes the latency of the execution pipeline to include the full delay of both the adder and the LZD circuits. In an effort to improve floating point performance, designers have employed various techniques to reduce the latency of the floating point execution pipeline, including predicting the location of the leading zero and/or one (xe2x80x9cleading zero/one anticipation,xe2x80x9d or LZA). For example, U.S. Pat. No. 4,926,369 to Hokenek et al., U.S. Pat. No. 5,493,520 to Schmookler et al., and U.S. Pat. No. 5,633,819 to Brashears et al. all describe various LZA implementations based upon the intermediate propagate and generate signals within the carry lookahead adder. U.S. Pat. No. 5,317,527 to Britton et al. describes an LZD technique that can be performed in parallel with the adder, based upon the input operands. These techniques do improve the performance of floating point units, because they eliminate the majority of the LZD delay.
The present invention is an efficient LZA method and apparatus that is implemented in three levels of N-NARY logic, and operates in parallel with but independent of the adder. The present invention generates dit-level propagate-generate-zero (PGZ) patterns and carry out signals from the input dits of the adder operands. The present invention produces a find-zero and a find-one output signal for each two-dit group of the adder result by combining PGZ patterns for the two dits within the group with the carry-out signal from the dit immediately preceding the two-dit group. Find-zero and find-one output signals for each two-dit group are then combined to produce find-one and find-zero coarse and medium shift select signals required by the normalizer.
N-NARY logic is described in a copending patent application, U.S. patent application Ser. No. 09/019,355, filed Feb. 5, 1998, now U.S. Pat. No. 6,066,965, and titled xe2x80x9cMethod and Apparatus for a N-NARY logic Circuit Using 1 of 4 Signalsxe2x80x9d, (hereafter, xe2x80x9cthe N-NARY Patentxe2x80x9d). As described in the N-NARY Patent, N-NARY logic uses a bundle of N wires routed together between different logic circuits, where information is encoded in the N wires, and where at most one and only one wire of the bundle of wires is true during an evaluation cycle. For example, a 1-of-4 N-NARY signal is a bundle of 4 wires that is capable of being encoded to represent 4 different values, and where at most, only one wire within the 4-wire bundle is true during an evaluation cycle. As explained in the N-NARY patent, a 1-of-4 N-NARY signal C, which comprises output wires C3, C2, C1, and C0, can be encoded to represent two Boolean bits A and B, as follows:
The present invention is capable of functioning in parallel with, but independent of, adders such as the 32-bit N-NARY adder described in copending patent application U.S. patent application Ser. No. 09/206,463, filed Dec. 7, 1998 (07.12.98), now U.S. Pat. No. 6,269,387, entitled xe2x80x9cMethod and Apparatus for 3-stage 32-bit Adder/Subtractorxe2x80x9d (hereafter, xe2x80x9cthe Adder Patentxe2x80x9d), which is a carry-lookahead adder. The fast carry propagate techniques described in the Adder Patent are also used in the present invention to predict whether a carry will be generated or propagated in specific groups of bits. As described in the Adder Patent, a carry will be generated for a specific bit Si of the sum, S, when the corresponding bits of the operands, Ai and Bi, are both 1. A carry in will be propagated across a specific bit Sj of the sum S, when one and only one of the corresponding bits of the operands, Aj and Bj, is 1. A carry in will not propagate across a specific bit Sk of the sum S (it will xe2x80x9chaltxe2x80x9d) when the corresponding bits of the operands Ak and Bk, are both 0. In other words,
Gi=Ai AND Bi
Pj=Aj XOR Bj
Hk=Ak NOR Bk
In 1-of-4 N-NARY logic, where the operands and the sum are encoded at the dit-level rather than the bit-level, carries will be generated for a given dit of the sum S if that dit is greater than 3. Likewise, carries will propagate across a given dit of the sum S only if that dit equals 3, and will propagate across a block of dits only if all dits equal 3. Those unfamiliar with the workings of the conventional 32-bit N-NARY Adder, including the methodologies for resolving generate and propagate signals, are encouraged to refer to that patent for a complete understanding of the fast carry techniques described there and utilized in the present invention. In addition, the Adder Patent also provides a complete description of the xe2x80x9cshorthandxe2x80x9d N-NARY notation that is used herein to depict the various gates implemented in the present invention in N-NARY logic. Both the N-NARY Patent and the Adder Patent are hereby incorporated by reference into this disclosure for all purposes.
The present invention is a leading zero/leading one anticipator that can operate in parallel with a floating point adder and that produces coarse and medium shift select signals for the coarse and medium shifters in the normalizer. In one embodiment, the present invention can be implemented in three levels of N-NARY logic, wherein the first logic level examines input dits of the adder operands and generates a PGZ pattern for the corresponding dit of the adder result. The first logic level also generates carry out signals that correspond to certain dit positions of the adder result. The second logic level produces a find-zero and a find-one output signal for each two-dit group of the adder result by combining PGZ patterns for the two dits within the group with the carry-out signal from the dit immediately preceding the two-dit group. Where PGZ patterns correspond to the two-dit Boolean value 0000, 1111, or 1110 and the carry-out from the prior dit is indeterminate, the present invention assumes a carry out always occurs to generate the find-one output, and assumes a carry out never occurs to generate the find-zero output. These assumptions may result in a one-bit misprediction, which can be corrected by the fine shifter in the normalizer. The third logic level combines find-zero and find-one output signals for each two-dit group of the adder result to produce a find-one coarse shift select signal, a find-zero coarse shift select signal, a plurality of find-one medium shift select signals, and a plurality of find-zero medium shift select signals.