As computer applications grow increasingly complex, hardware architecture is increasingly burdened by the requirements of higher speed and taking up less space, while also trying to avoid becoming unworkably complex. One way to reduce resource usage is to use components and functionalities for multiple uses. The present application considers the sharing of components/functionalities to perform both division and square root calculations. To understand the components/functionalities involved, division is discussed first below.
Like most calculations performed by processors, division is implemented as an iterative process. One category of division iterative processes, or algorithms, are digit recurrence algorithms, which use subtraction to obtain the quotient/remainder. “Restoring” digit recurrence algorithms is similar to the iterative process of division by paper and pencil, where it is sometimes required to restore the original dividend by adding the divisor to it. Intuitively, it can be seen this requires a certain amount of memory, and, if dividing two n-digit numbers, can result in 2n additions/subtractions being performed.
“Nonrestoring” digit recurrence algorithms eliminate the restoration cycles, and only require up to n additions. This is accomplished by representing the quotient as a digital set of positive and negative integers, such as, e.g., {−n, . . . , −1, 0, +1, . . . , +n}, which is converted into binary form. In this way, small errors in one iteration can be corrected in subsequent iterations.
Sweeney, Robinson, and Tocher (SRT) division, which is widely used in computing, is a special set of nonrestoring digital recurrence algorithms which use a lookup table (LUT) rather than computing certain iterative calculations. In SRT division, the quotient q can be represented and rewritten as shown in Equations (1)(a)-(1)(d):
                    q        =                  dividend          divisor                                              (          1          )                ⁢                  (          a          )                                        dividend        =                              (                          q              ×              divisor                        )                    +          remainder                                              (          1          )                ⁢                  (          b          )                                        such        ⁢                                  ⁢        that                                                                                remainder                          <                                          divisor                                ×          ulp                                              (          1          )                ⁢                  (          c          )                                and                                                                sign          ⁡                      (            remainder            )                          =                  sign          ⁡                      (            dividend            )                                                        (          1          )                ⁢                  (          d          )                    
where the input operands are given by dividend and divisor, and the results are q and remainder. The precision of the quotient is defined by the unit in the last position (u/p), where for an integer quotient u/p=1, and for a fractional quotient using a binary representation u/p=2−n, assuming an n digit quotient. The radix r of the algorithm, typically chosen to be a power of 2, determines how many quotient bits are retired in each iteration, such that r=2b. Accordingly, a radix r algorithm requires [n/b] iterations to compute an n digit quotient.
The following recurrence, as shown by Equations (2)(a)-(2)(b), is used at every iteration:rPo=dividend   (2)(a)andPj+1=rPj−qj+1divisor   (2)(b)
where Pj is the partial remainder, or residual, at iteration j. In each iteration, one digit of the quotient is determined. See, e.g., Oberman and Flynn, Minimizing the Complexity of SRT Tables, IEEE Transactions of VLSI Systems, vol. 6, no. 1, pp. 141-149 (March 1998), the entire contents of which are incorporated herein by reference.
Using Equations (2)(a) and (2)(b), each iteration of the SRT division recurrence comprises the following steps:
1) determine next quotient-digit qj+1 by the quotient-digit selection function;
2) generate the product qj+1×divisor; and
3) subtract qj+1×divisor from r×Pj to form the next partial remainder.
The quotient-digit function in step 1 is implemented by a LUT, known as a partial remainder-divisor (PD) table, as the LUT is based on the partial remainder and the divisor calculated in each iteration.
The square root function can be similarly implemented by an iterative process, in which the radicand is similar to the dividend, the partial radicand is similar to the partial remainder, and the root is similar in formation to the quotient. See, e.g., J. Fandrianto, Algorithm for High Speed Shared Radix 4 Division and Radix 4 Square Root, IEEE Symposium on Computer Arithmetic 1987, pp. 73-79, and J. Fandrianto, Algorithm for High Speed Shared Radix 8 Division and Radix 8 Square Root, IEEE Symposium on Computer Arithmetic 1989, pp. 68-75, the entire contents of both of which are incorporated herein by reference.
However, unlike SRT division, the square root operation does not have anything similar to the divisor, in terms of the starting point of the iterative process. In other words, a starting square root estimate must be generated in order to start the square root iterative process. Moreover, that starting estimate must ensure that the iterative process converges upon the square root using the same PD table as the SRT division operation.