Floating point divide and square root processors require a considerable physical region of an IC. To conserve space, divide and square root functions are usually implemented employing iterative algorithms so that the same hardware is utilized iteratively over multiple cycles to produce the final result. Depending on the particular design, each iteration of the algorithm produces either a single bit or multiple bits of the divide or square root result. Generally, processors that produce multiple bits each iteration can produce the final result in a shorter time period than processors that produce fewer or single bits each iteration. Also generally, the amount of hardware increases with the number of bits produced each iteration. Consequently, processors that produce multiple bits during each iteration require more IC space than processors that produce a single bit. Moreover, as the hardware increases, operation frequency is reduced. Hence, IC designers try to design divide and square root processors that strike a compromise between area and speed.
Most divide processors use a Radix-4 algorithm that produces up to two quotient bits during each iteration. However, the Radix-4 algorithm is complex, significantly more complex than a Radix-2 algorithm which produces one quotient bit each iteration. Moreover, the Radix-4 algorithm requires more hardware than the Radix-2 algorithm;                The algorithms that implement divide and square root functions are often similar. To conserve space on the IC, designers try to utilize as much of the divide hardware as possible in the implementation of the square root functions. However, these techniques have not been altogether successful. More recently, designers have implemented the divide function using the multiplier array of the arithmetic unit of the CPU, providing additional support for square root. However, the multiplier array also consumes a large amount of area on the IC, and can be frequency limited, so little is gained with this approach. Accordingly, there is a need for a divide and square root processor with improved spatial requirements.        