This invention relates to calculating trigonometric functions in integrated circuit devices, and particularly in programmable integrated circuit devices such as programmable logic devices (PLDs).
Trigonometric functions are generally defined for the relatively small angular range of 0-360°, or 0-2π radians. For angular values above 2π, the values of the trigonometric functions repeat. Indeed, one could restrict the range to 0-π/2, because various trigonometric identities can be used to derive values for trigonometric functions of any angle between π/2 and 2π from trigonometric functions of angles between 0 and π/2.
One method may that be used in integrated circuit devices for calculating trigonometric functions is the CORDIC algorithm, which uses the following three recurrence equations:xn+1=xn−dnyn2−n yn+1=yn+dnxn2−n zn+1=zn−dn tan−1(2−n)For example, to calculate a sine or cosine of an input, the x value is initialized to “1”, the y value is initialized to “0”, and the Z value is initialized to the angle required. Z is then rotated towards zero, which determines the sign of dn, which is ±1—if zn is positive, then so is dn, as the goal is to bring z closer to 0; if zn is negative, then so is dn, for the same reason. x and y represent the x and y components of a unit vector, as z rotates, so does that vector, and when z has reached its final rotation to 0, the values of x and y will have converged to the cosine and sine, respectively, of the input angle.
To account for stretching of the unit vector during rotation, a scaling factor is applied to the initial value of x. The scaling factor is:
            ∏              n        =        0            ∞        ⁢                  ⁢                  1        +                  2                                    -              2                        ⁢            n                                =      1.64676025812106564    ⁢    …  The initial x is therefore set to 1/1.64677 . . . =0.607252935 . . . .
Although CORDIC appears to be easily implemented in integrated circuit devices such as FPGAs, closer analysis shows inefficient use of logic structures. Common FPGA architectures have 4-6 input functions, followed by a dedicated ripple carry adder, followed by a register. When used for calculating floating-point functions, such as the case of single-precision sine or cosine functions, the number of hardware resources required to generate an accurate result for smaller input values can become large.
In one embodiment of a CORDIC implementation, the number of registers will be the datapath precision, multiplied by 3 (the width of the three datapaths x, y and z), multiplied by the datapath precision (the depth of pipeline must be sufficient to include the contributions all of the bits in the input numbers and the arc-tangent constants). In other words, the approximate size of the pipeline isR=3W2 where R is the number of registers and W is the datapath precision.
The amount of logic used is proportional to the square of the precision. For single-precision floating-point arithmetic (e.g., in accordance with the IEEE754-1985 standard for floating-point arithmetic), the 23-bit mantissa precision requires a much larger fixed-point CORDIC datapath. Assuming a full-range input (which may be restricted to approximately π/2 as discussed above), 23 bits are needed for the mantissa, plus one bit for the implied leading “1” and one bit for the sign bit position. Further bits may be required to the right of the mantissa, as each successive stage adds a smaller fraction of the other datapath. For example, if 30-bit datapath precision is accurate enough for a full range input, then 41 bits would likely be needed to cover the entire range of possible inputs. Using the 3W2 formula, 2700 registers would be needed for a 30-bit datapath, but 5043 registers—almost twice as many—would be needed for a 41-bit datapath.
In addition, in current FPGA architectures, ripple-carry adders are used as discussed above. Although there are some architectural features in some FPGAs to improve the speed of ripple-carry adders, generally the propagation delay of a ripple-carry adder varies linearly with the precision. In a ripple-carry adder, bit 0 is fixed immediately beside bit 1, which in turn is fixed immediately beside bit 2, and so on—both at the source and the destination. A large number of wide datapaths with a ripple carry adder at each stage would impose a severe routing constraint, reducing system performance because of routing congestion.
For relatively large angles, accuracy will be better for a given wordlength. Smaller angles will be subject to larger errors, for two reasons principally related to the Z datapath. First, the initial rotations applied may be much larger than the angle represented. For example, the first rotation, where n=0, is by tan−1(1) or 45°. Therefore, a number of iterations will be required just to return the Z datapath to the original input order of magnitude. In addition, for the later, smaller, rotations, they may reduce to zero values before the end of the datapath, affecting accuracy at that end.