The present invention generally relates to digital circuits and processors, and more particularly, to a selection based rounding system and method for eliminating the need for post increment based rounding in floating point (FP) arithmetic operations. Eliminating the post increment based rounding operation significantly increases speed. Although not limited to this particular application, the selection based rounding system and method of the invention are particularly suited for implementation in connection with a FP fused multiply adder of a high performance chip-based microprocessor or other digital circuit.
Currently, many arithmetic operations in present implementations of microprocessors are sped up by utilizing an on-board floating point (FP) processor, which implements FP mathematics (i.e., mathematics involving operation upon expressions having a significand and an exponent, where the value of each expression is equal to its significand multiplied by 2exponent), typically on very large numbers. These FP processors can include a fused multiply adder to increase the performance of the FP operations.
Fused multiply adders are well known in the art. In a typical fused multiply adder, two operands, for example, A and B, are multiplied together, and added to another operand C, so that the result R=A*B+C or the result R=A*Bxe2x88x92C. Generally, in the circuitry, the operands A and B are first multiplied together, while the other operand C is shifted, and then the product of A and B is added to the shifted C. Next, the sum is normalized by a shifting operation, and finally, the shifted sum is rounded.
As in many FP operations, it is frequently required that a result of a FP operation be rounded. IEEE and other industry standards specify different types of rounding processes, for example, round to zero, round to nearest, round to negative infinity, and round to positive infinity. The computation of whether the resulting FP number needs to be rounded and the rounding process itself can significantly undesirably impede the performance and hardware complexity of the fused multiply adder.
The result R is provided in a form that is unincremented or that is incremented, in order to satisfy the rounding requirement. For example, if there were a rounding requirement of either round to zero or round to negative infinity, then the unincremented result R would be output. If there were a rounding requirement of round to positive infinity, then the incremented result R would be output. Further, if the rounding requirement were round to nearest, then either the incremented or unincremented result R would be output.
To more specifically explain the rounding/incrementing process, consider an example of a FP fused multiply adder with rounding capabilities shown in FIG. 1 and generally denoted by reference numeral 5. The fused multiply adder 5 of FIG. 1 is designed to operate upon the significand portions (nonexponent part) of FP numbers. As is well known in the art, the exponent portions of such FP numbers are processed separately from the significand portions, and such processing is not described here for simplicity. As shown in FIG. 1, the fused multiply adder 5 includes a multiplier 11 that receives and multiplies two numbers A, B (for example, 64-bits each). Shifter 12 shifts the operand C by a predetermined amount in order to normalize it with respect to the mathematical product of A and B and to thereby enable it to be appropriately combined with the product of A and B at a later time.
The sum and carry outputs (for example, 128 bits each) of the multiplier 11 and the output of the shifter 12 are input into carry save adder 13, the design and operation of which is well known in the art. The sum and carry data from multiplier 11 are input to the carry save adder 13 as the addend and augend, respectively. The input from the shifter 12 is considered the carry-in from a less significant stage of the FP fused multiply adder 5. The carry save adder 13 generates a sum output and a carry output. Both the sum and carry outputs are input into a carry propagation adder 14 and a leading bit anticipator 15. The carry propagation adder 14 combines the sum and carry output from the carry save adder 13 to produce a FP number that is input into shifter 16. The design and operation of a carry propagation adder is also well known in the art.
The leading bit anticipator 15 computes a shift number that is equal to the number of significant bits to be shifted out to eliminate the leading zeros in the FP number generated by the carry save adder 13. The leading bit anticipator 15 also computes the shift number in a particular direction. This is done in order to determine the normalization of the sum and carry output of the carry save adder 13, for add, subtract, multiply or divide operations. An example of one of many possible architectures for the leading bit anticipator 15 is described in U.S. Pat. No. 5,798,952 to Miller et al.
The shift number generated by the leading bit anticipator 15 is input into shifter 16. Shifter 16 then performs a shifting operation on the FP number. The FP number is shifted by a number of bits equal to the shift number generated by the leading bit anticipator 15. Shifter 16 performs the function of shifting the FP number to the right or left alternatively as directed by the shift number. This is to eliminate the leading zeros of the FP number (i.e., normalizes the resulting FP number). The resulting normalized FP number is input into incrementor 17, rounding logic 18, and multiplexer (MUX) 19.
The incrementor 17 increments the normalized FP number to provide an incremented normalized FP number. The incrementor 17 inputs the incremented normalized FP number into MUX 19.
The rounding logic 18 determines if the normalized number output from shifter 16 requires rounding and the type based upon the examination of guard, round, and sticky bits associated with the output from shifter 16. The rounding logic 18 directs MUX 19 to select either the unincremented number or the incremented number for ultimate output from the FP fused multiply adder 5.
A major problem with the rounding architecture for a conventional FP fused multiply adder is that until the number resulting from a FP operation is normalized, it is very difficult, if not impossible, to determine whether the normalized result requires rounding. Since the incrementing of a result of a FP operation is performed after the normalization, extra time is needed to complete the FP operation. Furthermore, the incrementor is disadvantageous, as it can add many undesirable gate delays, i.e., at least log2 N gate delays where N is the number of bits. Both of the foregoing significantly compromises the performance of the fused multiply adder 5.
Thus, a heretofore unaddressed need exists in the industry for a way to address the aforementioned deficiencies and inadequacies, particularly, a way to better perform rounding, or incrementing, in a fused multiply adder 5.
The present invention provides a selection based rounding system and method for eliminating the need for post increment based rounding in floating point (FP) arithmetic operations. Eliminating the post increment based rounding operation significantly increases speed. Although not limited to this particular application, the selection based rounding system and method of the invention are particularly suited for implementation in connection with a FP fused multiply adder of a high performance chip-based microprocessor or other digital circuit.
Generally, in an FP fused multiply adder that employs the selection based rounding system, an unincremented result and an incremented result are produced substantially concurrently, in parallel, and then either one of the foregoing is selected as a rounded result based upon specified rounding criteria, thereby eliminating the need for an incrementor to perform rounding at or near the end of the FP fused multiply adder.
A specific preferred embodiment (intended to be a nonlimiting example; other implementations are possible) of the fused multiply adder that employs the selection based rounding system includes: (1) a multiplier designed to combine first and second operands A, B to produce a product; (2) a first shifter designed to shift a third operand C so that the third operand can be combined with the product; (3) a carry save adder designed to combine the product and the shifted third operand to produce a first sum and a first carry; (4) a leading bit anticipator (LBA) designed to determine an approximate leading bit location in the first sum and for producing an LBA word that defines a one approximately in a least significant bit position of the first sum; (5) a first carry propagation adder designed to combine the first sum and the first carry to produce the result; (6) a second carry propagation adder designed to combine the LBA word with the first sum and the first carry to produce an approximate incremented result; (7) a second shifter designed to normalize the unincremented result; (8) a third shifter designed to normalize the approximate incremented result; (9) a least significant bit fixup mechanism designed to convert the approximate incremented result into an accurate incremented result; (10) a MUX designed to receive the unincremented result and the accurate incremented result; and (11) rounding logic designed to select the unincremented result and to select the accurate incremented result, only either one at a time, by controlling the MUX, based upon rounding indicia associated with the unincremented result. The result R=A*B+C or the result R=A*Bxe2x88x92C.
The present invention can also be viewed as providing one or more methods. One such method can be broadly conceptualized as a process for a FP fused multiply adder having the following steps: computing in parallel an unincremented result and an incremented result; and selecting either the unincremented result or the incremented result as a FP rounded number.
Other systems, methods, features, and advantages of the present invention will become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included herein within the scope of the present invention.