The invention relates generally to computer hardware, and deals more particularly with circuitry for rapidly performing an XY+B operation with floating point format.
Most computer systems include electronic circuitry/hardware for performing arithmetic operations such as multiplication, division, addition and subtraction. Often times, the computer systems use a floating point format in which each number participating in or resulting from the arithmetic operation is represented by a mantissa, radix and exponent. The mantissa is a set of digits preceded by a decimal point (such that the mantissa is less than one). The radix is the base of the number, and the exponent is the power of the base. For example, the floating point representation for binary 100.1 is 0.1001.times.2 to the power of 3 where "0.1001" is the mantissa, "2" is the radix and "3" is the exponent.
One known technique for performing the function XY+B is to first perform the multiplication XY and then add the result to B. The multiplication can be divided into four phases. In the first phase, each digit of the multiplier is multiplied with the complete multiplicand to yield partial products which are grouped into a matrix. To expedite this phase, well known Booth encoders can be used to combine some of the multipliers together to reduce the number of partial products. Next, the partial products within the matrix are added together to yield a sum vector and a carry vector. This addition can be expedited by the well known Wallace tree or the circuitry illustrated in FIG. 3 of U.S. Pat. No. 4,969,118. Next, the sum and carry vectors are added together in a two-to-one adder. Finally, the output of the two-to-one adder is normalized, i.e. any leading zero digits in the mantissa are omitted, the first non-zero digit in the mantissa is positioned to the immediate right of the decimal point and the exponent is adjusted accordingly. Then, "B" is shifted left or right such that the resultant exponent of B equals the exponent of the normalized result of the multiplication, and the normalized result is added to B in another two-to-one adder. Finally, the result of this other two-to-one adder is normalized, and excess digits are truncated to the limits of the architecture to yield the final result. While this technique is effective in performing the XY+B function and provides maximum precision, it is slow because all steps are performed serially.
U.S. Pat. No. 4,969,118 discloses a faster circuit for performing the XY+B function in which the foregoing circuitry also generates the partial product matrix. However, while this is occurring, other circuitry within U.S. Pat. No. 4,969,118 calculates the required shifting of "B". Then, the shifted B is added with the partial products of the matrix. This technique is faster than the previously described serial process because the requisite shifting is calculated and implemented in parallel with the partial product generation instead of serially. Also, the addition of the shifted B with the partial products minimally adds to the duration of this addition step yet avoids the final addition step. Next, the sum and carry vectors which result from the summation of the partial products are added together in a two-to-one adder. Finally, the result of the two-to-one adder is normalized and truncated.
As noted above, both of the foregoing techniques for performing the XY+B function conclude with normalization and then truncation of excess digits, and provide maximum precision. For example, if the architecture supports a maximum of 6 hexadecimal digits or 24 binary bits (IBM ES/390 tm "short" precision), then the product of XY can be up to 12 hexadecimal digits. Because B is added to the result of the multiplication before the truncation, all digits resulting from the multiplication are considered in the addition of XY with B, and the final result is accurate to 6 hexadecimal digits.
In a prior art IBM System/390 architecture and other prior art architectures, the foregoing four multiplication steps are performed serially and then the B operand is added to the result of the multiplication. However, unlike the foregoing process, the result of the multiplication is truncated before the B operand is added. Consequently, if the architecture supports 6 hexadecimal digits and the shifted B operand comprises a non-zero digit in the 7th position which is as significant as the most significant digit in the 7th position truncated from the result of the multiplication before addition with the B operand, then the accuracy of the final result may be only 5 digits. This is because the 7th position digit from the result of the multiplication, if not truncated, when added to the B operand could effect the 6th digit. Nevertheless, 5 hexadecimal digits of accuracy are plenty for the vast majority of applications, and the truncation of the result of the multiplication before addition greatly simplifies the hardware.
A general object of the present invention is to provide circuitry which performs an XY+B function more rapidly than the prior art IBM System/390 architecture but yields an identical result.