The present invention is in the field of signal processing and methods for telecommunication, robotics, and control systems. The invention may be implemented in a semiconductor chip. The invention can be used, for example, in digital radio receiver hard and soft demappers such as may be present in a decision feedback equalizer, a frequency error estimator, and a phase error estimator, as well as in multiple-input multiple-output (MIMO) wireless receivers.
A digital radio receiver that receives a single-frequency modulated digital signal, for example an amplitude and phase-shift keying (APSK) signal, needs to interpret a received symbol's amplitude and phase, while the signal may be distorted by noise, echoes, fading, interference, non-linear distortion, and other undesired influences. Prior to and while interpreting the amplitude and phase, the radio receiver aligns the signal's frequency, and the symbol's timing.
A digital radio's receive signal typically enters the digital domain at one (intermediate frequency (IF) or low IF) or a pair (zero-IF/baseband) of analog-to-digital converters (ADCs) that may sample a first signal in-phase (I) with, and a second signal in quadrature (Q) to the received signal's radio-frequency (RF) carrier. The radio synchronizes the frame, the frequency, and the timing of the received signal, followed by equalization. The synchronized and equalized digital I and Q signals are then jointly offered to a demapper, that interprets the received signal as encoding one of a limited set of symbols. For example, a 4+12APSK modulation scheme may encode a total of 16 symbols into 4 constellation points in a first ring, of 4 phases, each 90 degrees apart, at a first amplitude level, and 12 constellation points in a second ring, of 12 phases, each 30 degrees apart, at a second amplitude level. The 16 constellation points encircle the origin of the I and Q plane of the corrected received digital signal, and the demapper decides for any pair of I and Q values which of the 16 symbols it is mostly likely to encode.
Radio receivers may employ hard demappers and/or soft demappers. Hard demappers are relatively simple, and they output bits identifying the most likely received symbol. Soft demappers are more complex, as they also output the distance between the received signal and the identified symbol, or multiple pairs of symbols and distance, or of symbols and likelihoods, where a likelihood may be inversely proportional with the distance. A demapper needs to perform one or more rectangular-to-polar conversions to translate a signal from the IQ (rectangular) domain to the phase and amplitude (polar) domain, where a vector to a constellation point in the rectangular IQ domain has an angle with the positive I-axis denoting the phase and a length denoting the amplitude in the polar domain.
A circuit capable of rectangular-to-polar conversion and very suitable for integration into a semiconductor chip is the coordinate rotation digital computer (CORDIC), first proposed by Jack E. Volder, “The CORDIC Trigonometric Computing Technique”, IRE Transactions on Electronic Computing, pp. 330-334, IRE/IEEE (1959). The CORDIC is a digital signal processor (DSP) dedicated to trigonometric calculations. A number of such calculation routines are known as “CORDIC algorithms”, and they can be configured by controlling input values of one or more CORDIC pins.
FIG. 1 illustrates a typical outline of a CORDIC 100. CORDIC 100 has inputs xin, yin, zin and outputs xout, yout, and zout. These interfaces may be used, for example, for two-dimensional input and output vectors {(0, 0) (xin, yin)} and {(0, 0) (xout, yout)}, or for similar three-dimensional input and output vectors, or for input and output signals in the polar domain, etc. CORDIC 100 may further have an input m to select a coordinate system and a mode input to select between a vectoring mode and a rotation mode. CORDIC 100 may comprise one or more CORDIC cells with the same interfaces. If CORDIC 100 comprises a single CORDIC cell, it uses the cell iteratively to perform a CORDIC algorithm, feeding back output signals of one iteration as input signals for the next iteration. Alternatively, CORDIC 100 may be “parallelized”, and comprise a concatenation of multiple stages of each one CORDIC cell whose output signals feed the inputs of the next CORDIC cell. A CORDIC may further be pipelined to increase throughput.
Like any DSP, a CORDIC's quality metrics are its (1) accuracy, for example expressed as its bit width; (2) throughput, for example expressed as operations per second; (3) latency, for example expressed in seconds or in clock cycles; (4) power, for example expressed in W or W/operation; and (5) die area occupied in a semiconductor chip. In case of a digital radio demapper, all five quality metrics are important. To achieve a low latency and a high throughput, a CORDIC performing the demapper function needs to be parallelized, which compromises die area and power. Therefore, there is a need to reduce die area and power in a parallel CORDIC, without sacrificing accuracy, throughput, or latency.
A CORDIC cell performs a clockwise rotation over a positive angle α of a vector {(0, 0) (x1, y1)}, based on the trigonometric formulas:x2=x1*cos(α)+y1*sin(α)y2=y1*cos(α)−x1*sin(α)Or counter-clockwise:x2=x1*cos(α)−y1*sin(α)y2=y1*cos(α)+x1*sin(α)This can be reduced to a single set of equations by introducing the sign σ, where σ=1 for clockwise rotation and σ=−1 for counter-clockwise rotation:x2=x1*cos(α)+σ*y1*sin(α)y2=y1*cos(α)−σ*x1*sin(α)Volder reduced the number of multiplications by dividing all members of the equation by cos(α), thereby allowing his rotated vector to increase in length by a factor of 1/cos(α):x2′=x2/cos(α)=x1+σ*y1*tan(α)y2′=y2/cos(α)=y1−σ*x1*tan(α)Lastly, Volder enabled simple digital implementation by allowing only angles α for which tan(α) has simple digital values tan(α)=1, ½, ¼, ⅛, 1/16, etc., or more generally, tan(α)=2−i, with i=0, 1, 2, 3, etc. This forms a series of available angles αi=arctan(2−i)=45°, 26.565°, 14.036°, 7.125°, 3.576°, etc. For these available angles αi, multiplication with tan(αi) is effectively a right-shift over i bits of the input value. To allow for rotation over any arbitrary angle, the arbitrary angle must be de-composed into the available angles αi, and the CORDIC cell must be used repeatedly to perform micro-rotations using only the available angles. Alternatively, a parallelized chain of CORDIC cells can perform the series of micro-rotations quasi-simultaneously.
Many researchers and developers have followed in Volder's footsteps to further develop CORDIC architectures and algorithms. Current architectures support two operating modes: rotation mode (as described above) and vectoring mode, where a vector's length and angle α with the positive x-axis are computed by performing a binary search for a series of micro-rotations that rotate the vector to the x-axis, such that the resulting x-coordinate equals the vector's length (to be corrected for the CORDIC gain due to the series of micro-rotations), the y-coordinate equals zero, and the sum of the micro-rotations equals −α. They also support three coordinate systems, using the parameter m, where m=0 for the rectangular, m=1 for the circular, and m=−1 for the hyperbolic coordinate system. Volder's original work supported two-dimensional coordinates, but current CORDICs support three dimensions. Its CORDIC cell equations are:x2′=x1+m*σ*2−i*y1y2′=y1−σ*2−i*x1z2=z1−σ*arctan(2−i) (for m=1, polar coordinates)z2=z1−σ*2−i (for m=0, rectangular coordinates)z2=z1−σ*arctan h(2−i) (for m=−1, hyperbolic coordinates)
FIG. 2 illustrates a conventional CORDIC cell 200. CORDIC cell 200 takes the output values x(i), y(i), and z(i) from iteration i or stage i. It determines sign bit σ(i) from either the y(i) or z(i) value, and performs iteration i+1 to calculate x(i+1), y(i+1), and z(i+1). The value of sign bit σ(i) depends on usage in vector mode (σ(i)=−sign(y(i)) or in rotation mode (σ(i)=sign(z(i)). Bit-shifter 220 right-shifts input signal y(i) by i bits, an operation that is equal to dividing y(i) by 2i. Adder/subtractor 210 adds this result to x(i) if m*σ(i)=1, or subtracts this result from x(i) if m*σ(i)=0. In this manner, adder/subtractor 210 and bit-shifter 220 perform the above function x2=x1+m*σ*2−i*y1. Likewise, bit-shifter 240 right-shifts input signal x(i) by i bits, and adder/subtractor 230 subtracts this result from y(i) if σ(i)=1, or adds this result to y(i) if σ(i)=0. In this manner, adder/subtractor 230 and bit-shifter 240 perform the above function y2=y1−σ*2−i*x1. Adder/subtractor 250 takes the fixed value arctan(2−i) or 2−i or arctan h(2−i), dependent on the selected coordinate system and the iteration only, and adds it to or subtracts it from the value z(i) to obtain z(i+1).
Previous versions of parallelized CORDICs have attempted to simplify part of the parallel stages and thereby reduce semiconductor die area or to reduce latency. For example, Tso-Bing Juang, et al., in “Para-CORDIC: Parallel CORDIC Rotation Algorithm”, IEEE Transactions on Circuits and Systems I, pp. 1515-1524, August 2004, proposed an architecture with low latency and high accuracy, but it consumed a large die area. Shaoyun Wang, et al., in “Hybrid CORDIC Algorithms”, IEEE Transactions on Computers, pp. 1202-1207, November 1997, proposed a pipelined CORDIC that reduced the number of constants to store. They achieved a great reduction in area but introduced inaccuracies that may prevent their CORDIC from being used in some communication applications. The present invention overcomes the prior art problems, and balances low latency and high accuracy within a small die area.