This invention relates to signal processing. In particular, this invention relates to a method and apparatus for calculating the reciprocal or inverse of a number.
Calculating the reciprocal plays an important role in the division operation, especially with floating-point decimal numbers. By using a reciprocal, the result of the division of two numbers can be obtained by multiplying the dividend by the inverse of the divisor. This division method can be used to increase the speed of processing complex calculations in digital processing devices such as computers and in application-specific integrated circuits such as digital signal processing (DSP) processors.
According to IEEE Standard for Binary Floating-Point Arithmetic 0754P-1985, which is incorporated herein by reference, floating-point standard numbers in floating-point formats are packed within 32 bits with a significand (mantissa) 24 bits long in single precision, or packed within 64 bits with a significand 53 bits long in double precision.
Several interpolation and iteration methods are widely used by developers for calculating reciprocals, including direct approximation, linear interpolation, square interpolation, cubic interpolation, and so on.
In the direct approximation method of obtaining reciprocal of a number, all possible mantissas for reciprocals are stored in a ROM table. Using this method the result can be obtained quickly, but this method requires an extremely large memory capacity. For example, to obtain a reciprocal according to the IEEE standard 754 single precision floating-point format requires 223xc3x9723=184 Mbits of memory.
The linear interpolation method is based on the theorem of a mean value from calculus, and can be summarized for the calculation of reciprocal as follows:                               1          x                =                              1                          x              0                                -                                    1                              ξ                2                                      ⁢                          (                              x                -                                  x                  0                                            )                                                          (        1        )            
where "xgr" ∈ [x0, x] and xxe2x89xa7x0.
It is also possible to use square interpolation, cubic interpolation and other interpolation methods to obtain a reciprocal with the required precision. However, all of these methods require additional multiplication operations, and require additional memory to store the correction coefficients. The main disadvantage of interpolation methods is that as the desired precision increases, the amount of memory required to store the necessary data also increases.
In digital computers the Newton-Raphson iteration method is widely used for calculating reciprocals. This method gives the solution of the equation
f(z)=0xe2x80x83xe2x80x83(2) 
based on employing the recurrent formula                               z                      i            +            1                          =                              z            i                    -                                    f              ⁡                              (                                  z                  i                                )                                                                    f                xe2x80x2                            ⁡                              (                                  z                  i                                )                                                                        (        3        )            
The values zi obtained after iteration i are quadratically converging toward z, so the corresponding errors ∈ after iteration i and iteration i+1 relate by the expression:
∈(Zi+1)xe2x89xa6∈2(Zi)xe2x80x83xe2x80x83(4) 
Employing the Newton-Raphson method for calculating reciprocal   x  =      1    a  
produces the following expression:
xi+1=xi*(2xe2x88x92xcex1*xi)xe2x80x83xe2x80x83(5) 
As can be seen from equation (5), every iterative step of this method involves two multiplication operations performed in sequence, and one xe2x80x982xe2x80x2s complementxe2x80x99 operation. The precision of a reciprocal thus doubles after each iterative step. The disadvantage of the Newton-Raphson iteration method by itself is that it can require multiple iteration steps to obtain a reciprocal with the required precision.
To overcome the above disadvantages methods have evolved to use some type of interpolation method to obtain the initial approximation of a reciprocal, and then employ an iteration method based on this approximation. As an example, it has been proposed to use inverse tables to obtain the initial values for consecutive iterations.
The present invention provides a method and apparatus for dividing a value which can deliver the inverse (reciprocal) of a number quickly and with a high precision.
According to the method of the invention, linear interpolation is employed to obtain an approximation of the reciprocal of a number. This approximation may then be used as an input value for Newton-Raphson iterations to calculate a reciprocal with high precision.
Unlike prior art methods, the method of the invention provides a formula for calculating a minimum number of entries in a look-up table to obtain the approximation of a reciprocal with required precision. The method of the invention also provides formulas for calculating initial approximations and correction coefficients for composing entries in look-up tables. An apparatus for implementing the method of the invention comprises a look-up table memory for storing these values, an integer multiplier, and a subtracter.
The present invention thus provides a method for generating an output signal representing an output value approximating a reciprocal of input value D having a normalized mantissa M (where 1xe2x89xa6M less than 2) represented by an input signal, the input signal comprising a set of N0 most significant bits and the output signal approximating the reciprocal with a desired precision ∈=2xe2x88x92N where Nxe2x89xa6N0, comprising the steps of: a for a set of P most significant bits of the input signal, generating a number n of entries in a plurality of lookup tables where n=2P, including the sub steps of: i. generating a set of input entries yi comprising a set of N significant bits in a first lookup table, where i=0, . . . , nxe2x88x921; and ii. generating a set of input entries Ki comprising a set of (Nxe2x88x92P) significant bits in a second lookup table, where i=0, . . . , nxe2x88x921; b. finding the entries yi and Ki in the lookup tables corresponding to the set of P most significant bits of the input signal; c. multiplying Ki by a signal comprising a set of (Nxe2x88x92P) significant bits following the set of P most significant bits of the input signal; and d. subtracting a set of (Nxe2x88x92P) most significant bits from the set of N significant bits of the entry yi.
In further aspects of the method of the invention: the step of generating n entries in the lookup tables comprises the sub steps of: iii. calculating the minimum number l of lookup table entries necessary to obtain a precision higher than the desired precision, where                               2          ⁢          l                +        1                              2          ⁢          l                +        2              -                  l                  l          +          1                       less than             ϵ      ⁢              xe2x80x83            ⁢      and      ⁢              xe2x80x83            ⁢                                    2            ⁢            l                    -          1                          2          ⁢          l                      -                            l          -          1                1              ≥  ϵ
and iv. finding a required minimum number n of lookup table entries for n=2P, where 2Pxe2x88x921 less than l and 2Pxe2x89xa7l; the step of generating a set of input entries in the first lookup table comprises the sub steps of: A. calculating             y      ^        i    =                                          x            i                    ⁢                      (                                          x                i                            +                              1                n                                      )                              +              1                  2          ⁢          n                                    x        i            ⁢              (                              x            i                    +                      1            n                          )            
where i=0, . . . , nxe2x88x921, x0=1, and             x              i        +        1              =                  x        i            +              1        n              ,
and B. finding entries yi comprising a set of N significant bits and approximating a mantissa of ŷi for i=0, . . . , nxe2x88x921; and/or the step of generating a set of input entries in the second lookup table comprises the sub steps of: calculating                     K        ^            i        =                                        2                          N              -              P                                                          x              i                        ⁡                          (                                                x                  i                                +                                  1                  n                                            )                                      ⁢                  xe2x80x83                ⁢        where        ⁢                  xe2x80x83                ⁢        i            =      0        ,  …  ⁢      xe2x80x83    ,      n    -    1    ,            x      0        =    1    ,            and      ⁢              xe2x80x83            ⁢              x                  i          +          1                      =                  x        i            +              1        n            
and finding entries Ki comprising a set of (Nxe2x88x92P) significant bits and approximating integer parts of {circumflex over (k)}i for i=0, . . . , nxe2x88x921.
The present invention further provides an apparatus comprising at least one processor for calculating an inverse I having a precision ∈=2xe2x88x92N of an input value D with normalized mantissa M (where 1xe2x89xa6M less than 2) comprising a set of N0 most significant bits where N0xe2x89xa7N, the apparatus comprising a first memory forming a lookup table addressed as a function of P most significant bits of the mantissa M and having an output I0 comprising a set of N significant bits; a second memory forming a lookup table addressed as a function of P most significant bits of the mantissa M and having an output K comprising a set of (Nxe2x88x92P) significant bits; a multiplier of size (Nxe2x88x92P)xc3x97(Nxe2x88x92P) having two inputs of a set of(Nxe2x88x92P) significant bits following the set of P most significant bits of the mantissa M and of the output K, and an output MU comprising a set of (Nxe2x88x92P)xc3x97(Nxe2x88x92P) significant bits; and an adder/subtracter having an output I and having two inputs connected to respectively receive the output I0 and the set of (Nxe2x88x92P) most significant bits of the output MU.
In further aspects of the apparatus of the invention: the first and second memories are combined into a storage device which stores both I0 and K and is addressed as a function of P most significant bits of the mantissa M; the apparatus further comprises a device for performing a programmed Newton-Raphson iteration based on I; the first memory comprises a read only memory (ROM); the second memory comprises a read only memory (ROM); the storage device comprises at least one read only memory (ROM); and or the apparatus is included in a digital signal processing device.