The present invention relates to arithmetic processing circuits in a computer system and, in particular to a circuit in a floating point processor having a fused multiply/ADD circuitry.
Arithmetic processing circuitry for binary numbers of latest prior art typically employs floating point arithmetic in accordance with the IEEE 754 standard. Floating point arithmetic, used in addition, multiplication, and division, first normalizes the binary numbers to be added, multiplied, or divided by shifting the binary numbers until, for a positive number, the first non-zero digit (i.e., 1) is immediately to the left of the radix point such that the mantissa part of the binary numbers is greater than or equal to one and less than two. A negative binary number will have leading ones. Thus, to normalize a negative number, the number must be shifted so that the first zero is immediately to the left of the radix point.
For multiplication, the normalized binary numbers are then multiplied and their exponents are added. For division, the normalized binary numbers are divided and their exponents are subtracted. For addition and subtraction, the normalized numbers must be shifted (i.e., aligned) so that their exponents are equal, then the numbers are added or subtracted.
To normalize a binary number, the number of leading zeros (for a positive number) or leading ones (for a negative number) must be quickly determined so that the number of shifts of the binary number can be quickly performed for the next arithmetic operation. Ideally, the leading one count or leading zero count (LZC) is performed in parallel with the arithmetic operation so that shifting can be performed immediately after the arithmetic operation.
A fused ADD/multiply (FPU) circuitry of the above type and operation is disclosed in U.S. Pat. No. 5,993,051.
For providing a very fast arithmetic processing circuit for generating said binary number, calculating the number of leading ones or zeroes in the number, and then shifting the number to produce a normalized binary result for a next floating point arithmetic operation, it is proposed in said US Patent to implement a combined leading zero anticipator (LZA) and leading one anticipator (LOA) connected to an output of a carry save adder (CSA) of said Multiply/Add circuitry, said combined leading zero anticipator and leading one anticipator calculating leading ones or leading zeros of an output of said adder in parallel with said adder adding said sum and carry bits, whereby the LZA input comes from the CSA adder.
Thus, as it is illustrated in FIG. 1 (prior art) in the above cited US Patent a fused multiply add Floating point unit implementation is disclosed, in which the addition of the addend B and the product A*C is split into an addition part 14 where the operands are overlapping and an incrementer part 16, where they are not overlapping.
If the exponent of the addend is greater than the exponent of the product, and both operands are normalized, then the addend would be aligned that way that the part that does not overlap the product comes down through the incrementer, and the significant part of the intermediate result would be located somewhere in the range of at position 1. All bits in the low part are only relevant for the sticky.
If both operands are normalized, and the addend exponent is equal or smaller than the product exponent, then the both fractions of the operands overlap in that way that the significant part of the intermediate result has to be taken from the adder output at position 2. Non-relevant bits are located in the high part of the intermediate result 17.
In such type of prior art FPU circuitry designs the controlling of the normalization 20 is done out of long LOA and LZA vectors, that are available too late. They do not utilize the possibility to calculate and use the information in which parts of the intermediate result bits of significance come down.
In regard of continuously increasing clock rates of processor units and thus shorter processing cycles the before-mentioned approach has the drawback that the output of the Leading Zero Anticipator (LZA) circuitry is provided too late, especially if the processing data width is high (for example double precision) in order to coincide with the output produced from the CSA. Thus, the information, if or if not a leading zero or leading one, respectively, is present in the leading part of the aligned addend (aligned with the product) is not produced although the teaching of the cited US-patent already considers the parallel calculation of the LZA unit with the ADD operation to be an advantage over the respective prior art associated with the cited U.S.-patent.