The present invention relates to a method for calculating a result of a division with a Floating-point Unit with fused multiply-add dataflow structure and a separate subtraction based divide processor.
A floating point unit with a fused multiply-add dataflow is described in G. Gerwig et. al. “The IBM eServer z990 floating point unit”, IBM J. Res. & Dev., Vol. 48, No. 3/4, 2004. Part of the function of this floating point unit is to calculate hexadecimal divide instructions. DD, DE, DER, DER, DXR are examples as defined in the z/Architecture Principles of Operation (IBM SA 22-7832).
A dividend D is divided by a divisor V getting the quotient Q as result, the quotient is built by normalization and rounding of the raw quotient according to the equation:Q=Round(Norm(D:V)).The normalization is done in steps of 4 bits (=1 hexadecimal digit). The rounding is done by truncation (round to zero).
In the following examples a width of 32 bit is assumed, 64 or 128 bits are also common.
For basic division the SRT-Algorithm is used, which is named after Sweeney, Robertson and Toucher, who independently proposed the algorithm.
For that Method it is required that the Divisor is bit-normalized, to guarantee convergence of the method.
Normalization of the Dividend may be useful, but is not required, as long as the full width of the Dividend is considered for the computation. There is an degree of freedom to choose the alignment of the Dividend. Part of the invention is to use this for getting the quotient prealigned to avoid an extra post processing step for hexadecimal alignment.
For the SRT divide algorithm, the Partial Remainder for the next iteration is calculated with the following iteration:Pi+1=(r*Pi)−qi+1*V Where Pi, is the Partial reminder in iteration i and r is the radix of the SRT algorithm (r=4 in the shown example). The resulting Quotient is the concatenation of all 1 . . . n partial quotient digits qi. The first partial quotient q0 is placed in the most significant position of the quotient register. The next lower quotient digit q1 is concatenated right to that and so on, until the final width is reached.
The number of iterations depends on the radix of the SRT division and the width of the quotient. In our example we have a radix of 4 and width of the raw quotient of 24+4. The “+4” are needed because of one guard digit has to be considered.
So there would 13 iterations be needed to calculate the raw quotient for a 24 bit-wide HFP operand fraction.