Generally mathematical operations in a computer such as, (i) division of a dividend by a divisor to produce a quotient and (ii) square root of a radicand to produce a root, are slow. Such division and square root operations are slow because they require iteratively generating a series of partial remainders, and quotient or root digits respectively.
Therefore, the speed of the division or square root operation is dependent on the amount of time it takes to complete one iteration and the total number of iterations required. The total number of iterations is dependent on the number of quotient or root mantissa digits required to provide an accurate quotient or root. For example, in floating point division twenty-four mantissa digits are required for single precision and fifty-three mantissa digits are required for double-precision, therefore the time required to generate each of the required quotient digits is critical to the overall speed of the division operation.
Typically, in each iteration of a square root operation, a root digit and a correction term are computed after examining a current partial remainder. The succeeding or partial remainder for the next iteration is computed by subtracting the correction term from the current partial remainder and scaling the result of the subtraction. In each iteration of a division operation, a quotient digit is computed after comparing a current partial remainder and the divisor. The partial remainder for the next iteration is computed by subtracting a multiple of the divisor from the current partial remainder and scaling the result of the subtraction.
Thus, the computation of the partial remainder for the next iteration for both the square root operation and the division operation requires a subtraction operation. Typically the subtraction is performed through the use of Carry Propagate Adders (xe2x80x9cCPAxe2x80x9d) or Carry Save Adders (xe2x80x9cCSAxe2x80x9d). CPAs are relatively slow because a carry bit must be propagated from the Least Significant Bit (xe2x80x9cLSBxe2x80x9d) CPA to the Most Significant Bit (xe2x80x9cMSBxe2x80x9d) CPA. CSAs are much faster but because they present the partial remainder as separate sum and carry binary numbers which must be added, examination of the partial remainder is slower and more complicated.
The tradeoff between examination speed and subtraction efficient speed (CPA and CSAs) is a long standing issue faced by computer divider and square root designers. Another long standing issue is the accumulation of root digit and quotient digits. The rate of accumulation of partial roots and/or quotients needs to be fast enough to support the rate of the main square root/division loop. This in turn determines how fast the overall square root/division operation is performed.
The present invention describes a method and apparatus for accumulating quotient and/or square root digits in an efficient manner. In particular, the present invention accumulates the quotient in carry-save form along with proper sign extension, using only one carry-save adder. By using minimal logic in the accumulation loop, the present invention provides a method and apparatus for accumulating partial quotients at a rate fast enough to support the rate of fast dividers.
In the preferred embodiment, a digital processor preforms a division operation on a dividend in a main loop. From this, quotient digits (i.e., partial quotients) are produced. A quotient accumulates receives and properly reconciles the quotient digits across all iterations in an efficient manner as follows.
The quotient accumulator is formed of a set of multiplexes coupled to a single carry-save adder. The multiplexes receive as input, prior accumulated quotient digits, partial quotient digits output from the main loop and sign extension digits corresponding to the partial quotient digits. The number of outputs of the multiplexes is less than the number of inputs.
The single carry-save adder receives as inputs the outputs from the multiplexes which number within the range acceptable by the carry-save adder. The carry-save adder produces than appropriate accumulated quotient and preferably at a rate fast enough to support the rate of the main loop.
Preferably the partial quotient digits output from the main loop and input to the multiplexes is in carry-save format. The partial quotient digits may include sum bits and carry bits from one iteration of the main loop and carry bits delayed from a prior iteration.
In accordance with one feature of the present invention, the sign extension digits are bit (possible fragmented bit strips) from a single constant value representing sign extensions of all partial quotients. Further included in the sign extension digits are switch bits for changing a strip of logic ones to logic zeros.