This invention relates to transcendental function evaluation, and, more particularly, to a method and apparatus for ultrafast evaluation of trigonometric functions using CORDIC rotations.
The CORDIC algorithm was introduced by Volder [1] for computations of trigonometric functions and their inverses. In the classic reference [1], he also showed that the CORDIC method can be used for multiplication/division, as well as for conversion between binary and mixed radix number systems. Walther then demonstrated a xe2x80x9cunifiedxe2x80x9d CORDIC algorithm [2] that could be used to calculate trigonometric, exponential and square root functions along with their inverses; as well as to carry out multiply/divide operations. In essence, the CORDIC method evaluates elementary functions merely by table look-up, shift and addition operations (Following the literature and standard design practice in this area, the term xe2x80x9cadditionxe2x80x9d is also assumed to include subtraction).
To evaluate an elementary function accurate to n bits, n pre-calculated fixed constants are all that are required to be stored in the look-up table. In some methods the number of constants to be stored can be n+1 or n+2. Each value stored in the look-up table must be accurate up to n+1 bits where the target output precision is n bits. This is a very small storage requirement and the look-up table can be realized easily in hardware.
The CORDIC algorithm has nice geometrical interpretations [1, 2]: trigonometric, exponential and multiply functions are evaluated via rotations in the circular, hyperbolic and linear coordinate systems, respectively. Their inverses (i.e., inverse trigonometric functions, logarithm and division) can be implemented in a xe2x80x9cvectoringxe2x80x9d mode in the appropriate coordinate system. Since its inception, CORDIC methods have been investigated by many researchers and have been employed in software and/or hardware for a variety of applications as summarized by the inventor hereof [3].
In this invention, I demonstrate an ultrafast double-stepping method applicable in the CIRCULAR ROTATION mode (used to evaluate sine/cosine/tangent) functions. I begin with an explanation of the circular rotations which are based on the iteration [5]
Xi+1=Xixe2x88x92si Yi2xe2x88x92ixe2x80x83xe2x80x83(1)
Yi+1=Yi+si Xi2xe2x88x92ixe2x80x83xe2x80x83(2)
Zi+1=Zixe2x88x92si arctan 2xe2x88x92i where sixcex5{xe2x88x921, 0, +1}xe2x80x83xe2x80x83(3)
and the angles arctan 2xe2x88x92i, i=0, 1, 2, . . . are known as xe2x80x9celementary anglesxe2x80x9d which are pre-computed and stored in a look-up table in typical CORDIC implementations. Using complex numbers, equations (1) and (2) can be rewritten as [6]
Wi+1=Wixc2x7(1+jsi2xe2x88x92i)xe2x80x83xe2x80x83(4)
where complex variable
Wi=(Xi+jxc2x7Yi) and j={square root over (xe2x88x921 )}xe2x80x83xe2x80x83(5)
Using the polar form of complex numbers,
W+1=Wi{square root over (1+(si2xe2x88x92i)2 )}xc2x7ejxcex8i=Wixc2x7Kixc2x7ejxcex8i wherexe2x80x83xe2x80x83(6)
xcex8i=arctan(si2xe2x88x92i) and Ki={square root over (1 +(si2xe2x88x92i)2)}xe2x80x83xe2x80x83(7)
Starting with W0, if m iterations are carried out, then
Wm=W0xc2x7Kxc2x7ejxcex8 wherexe2x80x83xe2x80x83(8)
                    K        =                                            ∏                              i                =                0                                            m                -                1                                      ⁢                          xe2x80x83                        ⁢                          K              i                                =                                    ∏                              i                =                0                                            m                -                1                                      ⁢                          xe2x80x83                        ⁢                                                            1                  +                                                            (                                                                        s                          i                                                ⁢                                                  2                                                      -                            i                                                                                              )                                        2                                                              ⁢                              xe2x80x83                            ⁢              and                                                          (        9        )                                θ        =                                            ∑                              i                =                0                                            m                -                1                                      ⁢                          xe2x80x83                        ⁢                          θ              i                                =                                    ∑                              i                =                0                                            m                -                1                                      ⁢                          arctan              ⁡                              (                                                      s                    i                                    ⁢                                      2                                          -                      i                                                                      )                                                                        (        10        )            
If Si, i=0, . . . , mxe2x88x921, are selected so that                                           ∑                          i              =              0                                      m              -              1                                ⁢                      arctan            ⁡                          (                                                s                  i                                ⁢                                  2                                      -                    i                                                              )                                      →                              θ            0                    ⁢                      xe2x80x83                    ⁢          then                                    (        11        )            xe2x80x83Wmxe2x86x92W0xc2x7K(cos xcex80+j sin xcex80)=(X0+jY0)xc2x7K(cos xcex80+j sin xcex80) orxe2x80x83xe2x80x83(12)
Xmxe2x86x92K(X0 cos xcex80xe2x88x92Y0 sin xcex80) andxe2x80x83xe2x80x83(13)
Ymxe2x86x92K(X0 sin xcex80+Y0 cos xcex80)xe2x80x83xe2x80x83(14)
If the initial values X0 and Y0 are set to 1 and 0, respectively, then
Xmxe2x86x92K cos xcex80 and Ymxe2x86x92K sin xcex80xe2x80x83xe2x80x83(15)
In general, the coefficients si at each step of the CORDIC iteration can take any of the three values {xe2x88x921, 0, +1}. If si=0 is allowed, then the scaling factor K is not a constant, but depends on the actual sequence of si values. On the other hand, if si can be restricted to xc2x11, then K is a constant (since the number of iterations m that are to be executed for a given precision are known ahead of time).
In this case, selecting X0={fraction (1/K)} and Y0=0 yieldsxe2x80x83xe2x80x83(16)
Xmxe2x86x92cos xcex80 and Ymxe2x86x92sin xcex80xe2x80x83xe2x80x83(17)
To evaluate the sine and cosine of an argument xcex80 up to n bits of precision, the constant-scale-factor CORDIC method therefore sets Z0=xcex80, X0={fraction (1/K)} and Y0=0. It then executes n iterations of equation (3), where the si values at each step i=1, . . . , n are selected so as to take the Zi+1 value as close to zero as possible. In other words, initial angle xcex80 gets xe2x80x9czeroed-outxe2x80x9d. For every iteration of equation (3), once the si value for that iteration i is determined, one iteration of equations (1) and (2) is carried out in tandem. At the end of n iterations, the X and Y variables contain the cosine and sine of argument xcex80, accurate up to n bits.
This method falls under the category of xe2x80x9cadditive normalizationxe2x80x9d since the initial angle Z0=xcex80 is zeroed out (i.e., it has at least mxe2x88x922 leading zeroes in a non-redundant representation, if m iterations are executed) by adding xc2x1arctan 2xe2x88x92i, i=0, . . . , (mxe2x88x921). In other words, in CORDIC, n+1 angles (arctan 20, . . . , arctan 2xe2x88x92n) must be utilized for n bits of precision. This can be deduced using the following identities                               arctan          ⁢                      xe2x80x83                    ⁢                      2                          -              n                                       less than                               ∑                          k              =                              n                +                1                                      ∞                    ⁢                      arctan            ⁢                          xe2x80x83                        ⁢                          2                              -                k                                                     less than                   2                      -            n                                              (        18        )                                                                    arctan              ⁢                              xe2x80x83                            ⁢                              2                                  -                  n                                                      -                                          ∑                                  k                  =                                      n                    +                    1                                                                    n                  +                  p                                            ⁢                              arctan                ⁢                                  xe2x80x83                                ⁢                                  2                                      -                    k                                                                                ≤                                    ∑                              k                =                                  n                  +                  p                  +                  1                                            ∞                        ⁢                          arctan              ⁢                              xe2x80x83                            ⁢                              2                                  -                  k                                                               less than                                     ∑                              k                =                                  n                  +                  p                  +                  1                                            ∞                        ⁢                          2                              -                k                                                    =                  2                      -                          (                              n                +                p                            )                                                          (        19        )            
At step i if the residual angle Zi greater than 0 then the algorithm selects si=+1 and if Zi less than 0 then si=xe2x88x921 so that the magnitude of the residual angle is constantly being reduced toward zero. For this method to work, the initial angle must satisfy                                           "LeftBracketingBar"                          Z              0                        "RightBracketingBar"                    ≤                                    ∑                              k                =                0                            ∞                        ⁢                          arctan              ⁢                              xe2x80x83                            ⁢                              2                                  -                  k                                                                    =        1.74328662                            (        20        )            
This range covers all angles of practical interest since {fraction (xcfx80/2)}≈1.5708 less than xcexa3k=0∞ arctan 2xe2x88x92k.
With the introduction of (redundant) signed-digit representations [3] the addition becomes carry-free. That is, the addition takes a small fixed amount of time, irrespective of the wordlength, thus offering a potential for significant speedup. To fully exploit the speed advantage gained by using signed digits, the sign detection of the residual angle also must be performed in a constant (and small) time delay independent of the precision/wordlength. The next action depends on whether the current residual angle is positive or negative. This in turn implies that only a fixed number of leading digits can be processed to determine the sign of the residual angle.
In most methods [4, 5] a window of three (leading) digits turns out to be sufficient to determine the sign. At each iteration, the window shifts right by one digit position. If at least one of the digits in the window of interest is non-zero, the sign of the residual angle can be determined to be xc2x11. If the sign is xc2x11, the next elementary angle (arctan 2xe2x88x92i at step i) should be subtracted; if the sign is xe2x88x921, the next elementary angle should be added (for the zeroing process). A problem occurs when all the digits in the window of interest are zero or in other words when the residual angle has many leading zeroes, so that merely analyzing the window of three (leading) digits cannot determine whether its sign is +1 or xe2x88x921. Ideally, in this case, one should select si=0 and neither add nor subtract the elemental angle for that step. However, if the coefficients si are restricted to {xe2x88x921, +1} (to render the scaling factor K to be a constant) then sign detection of the angle being zeroed becomes the bottleneck: it could require the examination of a large number of digits (possibly the entire word length). In essence, if the residual angle Zi has many leading zeroes, the decision whether to add or subtract the next elementary angle arctan 2xe2x88x92i can be made only after examining the first non-zero digit. In general this might imply scanning the entire word length, in which case the advantage due to constant time addition using signed digits is lost.
Takagi, Asada and Yajima proposed two different methods [4] viz., the method of xe2x80x9cdouble rotationxe2x80x9d and the method of xe2x80x9ccorrecting rotationsxe2x80x9d to overcome this problem. However, these methods require extra iterations which make them slow.
Duprat and Muller proposed a version of CORDIC algorithm [5] that lets the coefficient si be restricted to xc2x11, without the need for extra rotations. This is achieved by performing two CORDIC rotations in parallel at each step and retaining the correct result at the end of each step. Thus, extra hardware is needed. When the residual angle Zi has many leading zeroes, so that its sign cannot be determined by analyzing only a fixed number of most significant digits, Duprat and Muller""s algorithm initiates two sequences of computations: one assuming si=+1 and the other assuming si=xe2x88x921. It might appear that this branching could in turn lead to further branchings down the line. However, they showed that further branchings are not possible and that a branching terminates eventually (with both computation sequences converging) or, if it does not terminate till the end, then both the modules have the correct result (accurate up to the desired number of bits of precision). The main drawback of the Duprat/Muller method is that the additional hardware is idle most of the time. In that method, the two modules perform different computations only when in a branching. Otherwise they perform identical computations, which means one of the modules is not being utilized at all.
The basic idea of executing two sequences of computation in parallel can be improved upon, making it possible to determine two coefficients (si and si+1) at each step of the CORDIC iteration. Hence the name xe2x80x9cDouble Step Branching CORDICxe2x80x9d. In this inventive method, the modules perform distinct computations at each step and a decision block retains the correct result. The amount of additional hardware required by the inventive method is minimal; while the hardware utilization is better and the potential speedup is significant.
The new method and apparatus therefore achieves the following objectives:
(1) to accomplish each add/subtract operation in constant time, independent of the wordlength.
This is achieved by using intermediate signed digit representations for the operands, which makes the addition/subtraction carry free;
(2) to render the scaling factor K to be a constant. This is achieved by restricting si to xc2x11;
(3) to achieve maximum possible speed. The inventive method achieves this speed by using two successive elementary angles arctan 2xe2x88x922i and arctan 2xe2x88x92(2i+1) at each iteration step i, thereby executing two conventional rotations in a single step. For every such pair of angles, their sum and difference {arctan 2xe2x88x922i+arctan 2xe2x88x92(2i+1)} and {arctan 2xe2x88x922ixe2x88x92arctan 2xe2x88x92(2i+1)} is pre-computed and stored in the ROM look-up table to expedite the process by reducing the number of add/subtract operations during the iterations. Thus, the number of iterations is half of that in a conventional method.
(4) to minimize hardware overhead. This is achieved by the novel implementation of the ROM look-up table which comprises a major portion of the hardware. As noted before, instead of storing the individual angles in each pair, their sum and difference are stored. Thus the size of the look-up Table remains the same as in the conventional method. Other components such as signed digit adders, latches, barrel shifters, etc. are also required as in conventional methods. Consequently the only additional hardware required by this method is due to a few latches and due to the fact that the decision block is slightly more complex. This overhead is small since the full word length (i.e., n bits long) adders, shifters, multiplexors and ROM look-up table constitute the major components; and
(5) to maximize hardware utilization. In this method, all modules are always performing useful computations. In contrast, in Duprat and Muller""s conventional method, one of the modules is inactive until that algorithm is in a branching.