The present invention relates generally to calculators for computing square roots of numbers and more specifically to an improved method and calculator for taking square roots of binary numbers
The ability to perform fast square root extraction of binary numbers is crucial to the speed performance of many Digital Signal Processors (DSP), image and graphics processing circuits, where vector and matrix operations, complex number computations and coordinate conversions frequently need to take place. Typical examples of such systems are Sobel Edge Detection processors commonly used in radar signal processing, image recognition and target tracking systems. Another class of operations is rasterizing systems converting Cartesian coordinates to radial coordinates, DSP power spectrum analyzers and digital correlators and cross-correlators.
In general, three classes of algorithms for calculation of square roots are currently in use: (a) subtractive algorithms; (b) multiplicative algorithms; and (c) divisive algorithms. The (b) and (c) type algorithms are frequently used in conjunction with look-up tables serving to speed up the execution of the algorithm by providing a more accurate seed for algorithm's iterative process.
The subtractive algorithms are primarily used for hardware-based square root extraction circuits since they lend themselves best to the iterative trial-and-error root extraction. They are quite similar to the restoring division algorithms, and typically require N+2 steps to calculate an integer square root of an N-bit unsigned or positive integer. Thus, in a simple implementation, only a perfect square approximation of the square root and a remainder are provided, rather than a fixed or floating point full precision result. Full precision fractional results can also be obtained using this method at the expense of additional approximation steps.
Another class of algorithms, called multiplicative algorithms, is used to perform a full precision fractional and floating point square rooting. These algorithms are based either on Newton-Raphson approximation formula, or several series expansion formulas (such as Taylor series). Because of Newton-Raphson method's rapid convergence (i.e., the least number of iterations required to achieve desired precision compared to other known methods), it is the most common approach to fast computation of binary square root. The formula: EQU X(n+1)=0.5*X(n)*[3-A*X(n)*X(n)] (1)
produces a 16-bit precision square root of A in approximately four iterations. However, it requires four multiplications, one subtraction and one shift per iteration, thus totalling 16 multiplication steps per 16-bit result plus overhead required to determine the seed value X(0).
Other multiplicative methods based on series expansion also require sixteen or more iteration steps before desired precision is attained and almost always call for the use of coefficient look-up tables to speed up the computational process.
The divisive algorithms are based on the original Newton-Raphson approximation formula: EQU X(n+1)=0.5*[X(n)+A/X(n)] (2)
Because of the need to perform division for each iteration step, however, this algorithm is used for software-based square root computations, where the division consumes approximately the same number of steps as multiplication.
The duration of execution of most square root algorithms for a given precision of the result (except for series expansion based ones) is directly related to the precision of the starting estimate of the square root X(0). Depending on the accuracy with which the algorithm determines the beginning value of the X(0) (called a "seed" value), the algorithm will require lower or higher number of iterations to arrive at the result with a desired precision. Because of this, Newton-Raphson based algorithms often rely upon look-up tables to determine more accurate seed values X(0) before the iterative process is started. The use of look-up tables, however has its drawbacks, such as the need for large on-chip ROMs and an extra overhead involved in look-up. Thus, it is used only for high precision (such as 32-bit and greater) square root computations, where the look-up overhead is small compared to the iteration time saved.
Thus, it is an object of the present invention to provide a method of calculating square roots which offers significant speed advantage over those currently in use both for software and hardware based square root calculations by reducing the number of iterations required to attain desired precision of the result.
Another object of the present invention is to provide an increased speed over prior art square root calculations with a resulting precision equal to that of the initial operand.
A still even further object of the present invention is to provide a square root calculator and method which can take integer square roots as well as fraction floating point square root operations with equal precision.
A still even further object of the present invention is to provide a square root calculator which is suitable for implementation by double multiplier arrays.
These and other objects are achieved by performing a plurality of iterations of the following: EQU X(n+1)=X(0)+X(n)*X(n)*2.sup.-0.5*max-1
where X(0) is the seed value, max is the weight of the most significant bit of the smallest perfect binary square higher than the most significant bit of the operand A. The square root R is then calculated as follows: EQU R=2.sup.0.5*max -X(last)
The seed value, or X(0) is calculated as follows: EQU X(0)=(2.sup.max -A)*2.sup.-0.5*max-1
To reduce the number of iterations required, the operand can be initially upscaled and then downscaled in the final operation. The equations above thus become: EQU X(0)=(2.sup.MAX -A*2.sup.L)*2.sup.-0.5*MAX-1 EQU X(n+1)=X(0)+X(n)*X(n)*2.sup.-0.5*MAX-1 EQU R=(2.sup.0.5*MAX -X(last))*S
where:
MAX=maximum binary width of the input operations (e.g., 16 or 32 bit precision); PA1 L=exponent of the initial upscaling factor, i.e., number of leading zeroes in the input operand A; and PA1 S=final downscaling factor equal to 2.sup.-0.5*L
Application of these formulas will result in obtaining a full 16-bit precision of the result in only 6 iteration steps for a 16-bit binary number.
A calculator to perform the method in addition to a software implementation includes a seed calculator, a scaling factor circuitry, iteration circuitry, output format adjuster and iteration sequencer and control logic. These circuits use multiplexers, shifters, registers, multiplier arrays, adders, subtractors and two's complements.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.