High speed computation is a critical design factor in many systems such as computers, signal processors and process controllers. These systems increasingly rely on LSI integrated circuits employing highly parallel designs to provide high speed calculations. Specialized integrated circuits that perform integer and floating point multiplications using parallel techniques are widely available. However, integrated circuits that perform the division and square root functions are generally serial in nature, thereby reducing their effectiveness.
Division can be performed by high speed multipliers in conjunction with other circuitry by using well known convergence algorithms. Generally, high speed parallel multipliers can be divided into two major parts. The first part contains the partial product generators and an adder array that reduces the partial products to a sum and carry stream. The second part contains a final adder that sums the carry and sum stream together. Because the second part involves a carry chain, the final addition consumes approximately the same amount of time as the partial product generation and addition. A pipeline register is often inserted between the two halves to increase the throughput of the multiplier, since the first half can start the next operation while the second half completes the original calculation.
As a result, high speed parallel multipliers require at least three clock cycles for each iteration of the convergence algorithm. Hence, division and square root calculations require a substantial amount of time relative to other calculations.
Therefore, a need has arisen in the industry to provide a processor which is capable of high speed division and square root calculations.