With a recent rapid growth in computer application environments that require complex mathematical computations, such as graphic rendering, computer-aided design (CAD), or digital signal processing (DSP), almost all high performance microprocessors may be required to support operations of floating-point addition, multiplication, division, square rooting, etc., on the basis of the IEEE 754-1985 floating-point standard.
In a typical data processing system, the division and square rooting operations may more infrequently happen than addition or multiplication. However, while addition or multiplication may require about 3 cycles of latency, the division and square rooting operations each may require more than 20 cycles of latency, which will considerably affect overall performance of a system at least from a latency point of view.
The most common division process is the SRT division process. The SRT is an acronym for Sweeney, Robertson, and Tocher, who proposed processs similarly characterized to those at the same period. The SRT division process enhances the operation speed of non-restoring division by admitting zero (0), as a quotient digit, with which there is no need of conducting addition/subtraction. The principle of the SRT division process can be applied to a square rooting operation. The structure of the SRT process is disclosed in detail in “Computer Arithmetic Algorithms and Hardware Design” of Oxford university Press 2000 by Behrooz Parhami.
It is easy to implement the traditional radix-2 SRT process in a hardware structure, but a lot of iterations are necessary to obtain quotient/square-roots. To the contrary, the radix-4 SRT process is difficult to reduce to a hardware structure and has a longer delay time for each iterative operation, while it reduces the number of iterations to a half of the radix-2. U.S. Pat. No. 5,258,944 discloses a way of referring a lookup table to select a quotient at each iterative step by means of the radix-4 SRT process.