There are many situations in which a computer system needs to perform a square root operation. To give just some examples, numerical analysis, complex number computations, statistical analysis, computer graphics, and signal processing are among the fields where square root operations are often performed by computer systems. There are many different ways in which a computer system may implement a square root operation. For example a square root may be computed in a digit-by-digit manner, such as in restoring, non-restoring, and SRT (named after its creators: Sweeney, Robertson and Tocher) techniques. However, iterative converging approximation methods are often faster in determining the result of a square root operation to a defined number of bits of accuracy. Examples of iterative converging approximation techniques used are the Newton-Raphson and Goldschmidt techniques, which start with an initial estimation of the square root or its inverse and then iteratively converge on a better solution. Also, the detailed implementation of these iterative techniques can be done with different factorizations of the basic equations. Further, initial approximations may be obtained by various methods, such as bipartite lookup tables and ITY (Ito-Takagi-Yajima) initial approximation algorithms.
In general, the Newton Raphson technique can be used to find the value of a function, say g(z), for some particular input value of z, called b. It may be the case that the function g(z) cannot easily be computed directly (e.g. if the function is a square root operation), and in that case a different function, let's call it f(x), is used wherein f(g(b))=0. The Newton-Raphson technique is an example of an iterative converging approximation technique which is good for finding a zero of a function, and if it is applied to the function f(x), then a value for g(b) can be determined, by finding the value of x at which f(x)=0. For example, if the function g(z) is a square root operation, g(z)=√{square root over (z)}, then the function f(x) may be chosen to be f(x)=b−x2, because this function equals zero when x=√{square root over (b)}. There are other options for functions f(x) that would equal zero when x=√{square root over (b)}.
The general principles of the Newton-Raphson method are well known in the art, but a brief explanation is given here to aid the understanding of the following examples. The Newton-Raphson method starts with an initial guess (denoted p0) for a zero of the function f(x). The initial guess is typically close to, but not exactly equal to, the correct answer, such that f(p0)≠0, so p0≠g(b). From the point (p0, f(p0)), the tangent to the curve f(x) is determined and then the value of x at which the tangent intersects the x-axis is found. The slope of the curve f(x) is given by the derivative of f(x), by the equation:
                                          f            ′                    ⁡                      (            x            )                          =                              ⅆ                          ⅆ              x                                ⁢                                    (                              f                ⁡                                  (                  x                  )                                            )                        .                                              (        1        )            
The point (p,q)=(p0,f(p0)) and the slope m=f′(p0) determines the line of the tangent according to the equation:y=mx+b=mx+q−mp=xf′(p0)+f(p0)−p0f′(p0).  (2)
The straight line defined by equation 2 is a local approximation to the curve f(x). Thus, the value of x where this line crosses the x-axis is similar to the value of x where f(x) crosses the x-axis. Hence, the value of x where this line crosses the x-axis is a better approximation than p0 to the value of x where f(x)=0. So the value of x where the line crosses the x-axis is used as the next approximation, p1, of the zero of f(x), i.e. the next approximation of g(b). To find where the line of equation 2 intersects the x-axis, y is set to zero and the equation is solved to find x such that:
                    x        =                              p            0                    -                                                    f                ⁡                                  (                                      p                    0                                    )                                                                              f                  ′                                ⁡                                  (                                      p                    0                                    )                                                      .                                              (        3        )            
This method is iterated to repeatedly find better approximations of the zero of the function until a desired accuracy of the result is achieved. For example, the desired result may be a single precision floating point number in which case at least 24 bits of precision are desired; or the desired result may be a double precision floating point number in which case at least 53 bits of precision are desired. Therefore, over a sequence of iterations, the method will determine the approximations as:
            p      1        =                  p        0            -                        f          ⁡                      (                          p              0                        )                                                f            ′                    ⁡                      (                          p              0                        )                                          p      2        =                  p        1            -                        f          ⁡                      (                          p              1                        )                                                f            ′                    ⁡                      (                          p              1                        )                                          p      3        =                  p        2            -                        f          ⁡                      (                          p              2                        )                                                f            ′                    ⁡                      (                          p              2                        )                              and in general for the (i+1)th iteration:
                              p                      i            +            1                          =                              p            i                    -                                    f              ⁡                              (                                  p                  i                                )                                                                    f                ′                            ⁡                              (                                  p                  i                                )                                                                        (        4        )            
Each iteration provides a better approximation than the previous iteration for the zero of the function f(x).
As well as ensuring that an accurate solution is provided, other considerations when choosing how to implement an operation in a computer system are how long the operation will take (i.e. the latency of the operation) and the power consumption of performing the operation on the computer system. These considerations are particularly important in computer systems which have particularly limited processing resources, e.g. on mobile devices, for which the processing power is preferably kept low to avoid draining a battery and/or to avoid excess heat generation. Furthermore, the operations often need to be performed in real-time (e.g. when a user is waiting for a response which depends upon the result of the operation, e.g. when the user is playing a game which uses a graphics processor which needs to perform a particular operation (e.g. a square root operation)), and in these cases the latency of the operation is important. Therefore, any improvement to the speed and/or power consumption of operations, such as square root operations, performed on computer systems may be of significant benefit.
Some mathematical operations are simple to perform in hardware, such as addition, subtraction, multiplication and shifting. However, other mathematical operations are not so simple to perform in hardware such as division and performing a square root. If an iterative converging approximation technique such as the Newton-Raphson technique is used to find the result of a square root operation, some known functions to be used by the Newton Raphson technique for performing a square root would involve performing division computations. For example, if the Newton Raphson method is performed on the function f(x)=b x2 then equation 4 becomes:
                              p                      i            +            1                          =                              p            i                    +                      b                          2              ⁢              x                                -                                    x              2                        .                                              (        5        )            Implementing equation 5 would involve a division by x, and as such is not simple to compute in a computer system.