Wireless transmission through multiple antennas, often referred to as “MIMO” (Multiple-Input Multiple-Output), currently enjoys great popularity because of the demand for high data rate communication from multimedia services. Many applications are using or considering the use of MIMO to enhance the data rate or the robustness of communication links. These applications include the next generation of wireless LAN networks (such as IEEE 802.11n networks), mobile “WiMax” systems for fixed wireless access (“FWA”), and fourth generation (“4G”) mobile terminals.
MIMO detection is often concerned with estimating the sequence of digitally modulated symbols simultaneously transmitted from multiple sources, such as from multiple transmitters or from a single transmitter with multiple antennas. A MIMO detector often receives as input a version of the sequence of digitally modulated symbols that has experienced co-antenna interference, been distorted by a fading channel, and been corrupted by noise.
In general, a narrow-band MIMO system can be represented by the following linear complex baseband equation:
                    Y        =                                                                              E                  s                                T                                      ⁢            HX                    +                      N            .                                              (        1        )            Here, T represents the number of transmit antennas. Y represents a received vector (size Rx 1), where R represents the number of receive antennas. X represents a transmitted vector (size Tx1). H represents an RxT channel matrix, where entries in the matrix represent complex path gains from transmitter to receiver samples of zero-mean Gaussian random variables with variance σ2=0.5 per dimension. N represents a noise vector (size Rx1) containing elements that represent samples of independent circularly symmetric zero-mean complex Gaussian random variables with variance N0/2 per dimension. Es represents a total per symbol transmitted energy (under the hypothesis that the average constellation energy is unity). Equation (1) may have to be considered valid per subcarrier in wideband orthogonal frequency division multiplexing (“OFDM”) systems.
Maximum-Likelihood (“ML”) detection is often desirable to achieve high performance in a communication system, as this is the optimal detection technique in the presence of additive white Gaussian noise (“AWGN”). ML detection typically involves finding the transmitted vector X that minimizes the minimum of the squared norm of the error vector, which can be expressed as follows:
                              X          ~                =                  arg          ⁢                                          ⁢                                    min              X                        ⁢                                                                                                  Y                    -                                                                                                                        E                            s                                                    T                                                                    ⁢                      HX                                                                                        2                            .                                                          (        2        )            Here, the notation corresponds to the commonly used linear MIMO channel, where independent and identically distributed (“IID”) Rayleigh fading and ideal channel state information (“CSI”) at the receiver are assumed. ML detection typically involves an exhaustive search over all of the possible STsequences of digitally modulated symbols, where S is a Quadrature Amplitude Modulation (“QAM”) or Phase Shift Keying (“PSK”) constellation size and T is the number of transmit antennas. This means that ML detection often becomes increasingly unfeasible with the growth of the spectral efficiency.
Because of their reduced complexity, sub-optimal linear detection algorithms, such as Zero-Forcing (“ZF”) or Minimum Mean Square Error (“MMSE”) algorithms, are widely employed in wireless communications. These algorithms belong to the class of linear combinatorial nulling detectors. This means that estimates of each modulated symbol are obtained by considering the other symbols as interferers and performing a linear weighting of the signals received by all of the receive antennas.
To improve their performance, nonlinear detectors based on a combination of linear detectors and spatially ordered decision-feedback equalization (“O-DFE”) have been proposed. In these techniques, the principles of interference cancellation and layer ordering were established. The terms “layer” and “antenna” and their derivatives may be used interchangeably in this document. In these detectors, a stage of ZF or MMSE linear detection, also called interference “nulling”, is applied to determine T symbol estimates. Based on the “post-detection” signal-to-noise ratio (“SNR”), the first layer is detected. After that, each sub-stream in turn is considered the desired signal, and the other sub-streams are considered “interferers.” Interference from the already detected signals is cancelled from the received signal, and nulling is performed on modified received vectors where fewer interferers are effectively present. This process is often called “interference cancellation (IC) and nulling” or “spatial DFE.”
For interference cancellation, the order in which the transmit signals are detected may be critical for the performance of the detector. An optimal criterion has been established that corresponds to maximizing the minimum SNR (“maxi-min” criterion) over all possible orderings. Fortunately, for T transmit antennas, it can be demonstrated that only T*(T+1)/2 dispositions of layers have to be considered to determine the optimal ordering, instead of all possible T! dispositions.
A better performing class of detectors may be represented by list detectors (“LDs”), which are based on a combination of the ML and DFE principles. The common idea is to divide the transmit streams to be detected into two groups. First, one or more reference transmit streams are selected, and a corresponding list of candidate constellation symbols is determined. Second, for each sequence in the list, interference is cancelled from the received signal, and the remaining symbol estimates are determined by sub-detectors operating on reduced size sub-channels. Compared to O-DFE, the differences lie in the criterion adopted to order the layers and in the fact that the symbol estimates for the first layer (i.e. prior to interference cancellation) are replaced by a list of candidates. The best performing variant corresponds to searching all possible S cases for a reference stream or layer and adopting spatial DFE for a properly selected set of the remaining T-1 sub-detectors. In this case, the list detector may be able to achieve full receive diversity and an SNR distance from ML in the order of fractions of decibels, provided that the layer order is properly selected. A notable property is that this can often be accomplished through a parallel implementation as the sub-detectors can operate independently. The optimal ordering criterion for list detectors stems from the principle of maximizing the worst-case post-detection SNR (“maxi-min”), as proposed for O-DFE. This results in computing the O-DFE ordering for T sub-channel matrices of size Rx(T−1), thus entailing a complexity of O(T4).
Besides performance (the benchmarks are optimal ML detection and linear MMSE and ZF on the two extremes, respectively), various features may be key for a MIMO detection algorithm to be effective and implementable in the next generation of wireless communication algorithms. These features may include:                the overall complexity of the detection algorithm;        the possibility of generating bit soft-output values (or log-likelihood ratios or “LLRs” if in the logarithmic domain), as this may yield a significant performance gain in wireless systems employing error correction codes (“ECC”) coding and decoding algorithms; and        a parallelizable architecture of the algorithm, which may be fundamental for an Application Specific Integrated Circuit (“ASIC”) implementation or other implementation and for yielding the low latency required by a real-time high data rate transmission.        
The various types of detectors mentioned above are often characterized by a number of disadvantages. For example, ZF and MMSE schemes are often highly sub-optimal since they yield a low spatial diversity order. For a MIMO system with T transmit antennas and R receive antennas, this is equal to R−T+1, as opposed to R for an ML detector. Also, in practical applications adopting MIMO-OFDM and ECC in bit-interleaved coded modulation (“BICM”) schemes, a significant gap is observable for MMSE if R=T.
Not only that, nonlinear ZF or MMSE-based O-DFE schemes may have a limited performance improvement over linear ZF or MMSE schemes due to noise enhancements caused by nulling and error propagation caused by interference cancellation. Also, as with the linear detectors, the non-linear detectors may suffer from ill-conditioned channel conditions. Further, the complexity of the original nonlinear algorithm is very high, O(T4), as it involves the computation of multiple Moore-Penrose pseudo-inverse matrices of decreasing size sub-channel matrices. More recent efficient implementations exist, though they still have a complexity of O(T3). In addition, no strategy to compute the bit soft metrics has been proposed and developed for O-DFE detectors.
List detectors also often suffer from several drawbacks. For example, a “parallel detection” (PD) algorithm used in list detectors suffers from a high computational complexity because T O-DFE detectors acting on Rx(T−1) sub-channel matrices have to be computed. This involves the computation of the related Moore-Penrose sub-channel pseudo-inverses. While this could be efficiently implemented through T complex “sorted” QR decompositions, the overall complexity is still in the order of O(T4). Moreover, known list-based detection algorithms do not incorporate a method to produce soft bit metrics for use in modern coding and decoding algorithms.
Another family of ML-approaching detectors is represented by lattice decoding algorithms, which are applicable if the received signal can be represented as a lattice. The terms “decoder” and “detector” and their derivatives may be used interchangeably in this document. The Sphere Decoder (“SD”) is the most widely known algorithm in this family and can be utilized to attain hard-output ML performances with significantly reduced complexity. The operations of the SD algorithm can be divided into three steps: lattice formulation, lattice pre-processing, and lattice search.
In lattice formulation, the complex baseband model in Equation (1) is translated into the real domain, such as:
                    x        =                                            [                                                                                          real                      ⁢                                                                                          ⁢                                              (                        X                        )                                                                                                                                                        imag                      ⁢                                                                                          ⁢                                              (                        X                        )                                                                                                        ]                        ⁢                                                  ⁢            y                    =                      [                                                                                real                    ⁢                                                                                  ⁢                                          (                      Y                      )                                                                                                                                        imag                    ⁢                                                                                  ⁢                                          (                      Y                      )                                                                                            ]                                              (        3        )            with real vectors of respective sizes mx1 and nx1 (where m=2T and n=2R). The equivalent real channel matrix B can be expressed as follows:
                    B        =                  [                                                                      real                  ⁢                                                                          ⁢                                      (                    H                    )                                                                                                -                                      imag                    ⁡                                          (                      H                      )                                                                                                                                            imag                  ⁢                                                                          ⁢                                      (                    H                    )                                                                                                real                  ⁡                                      (                    H                    )                                                                                ]                                    (        4        )            which can be regarded as an nxm “lattice generator” matrix. Neglecting for simplicity possible scalar normalization factors, the SD algorithm typically attempts to find a solution to the following minimization problem:
                              x          ^                =                  arg          ⁢                                    min              x                        ⁢                                                                            y                  -                  Bx                                                            2                                                          (        5        )            spanning the set of possible values for the in-phase (I) and quadrature-phase (Q) components of the complex digitally modulated symbols X independently, and restricting the search to a “sphere” of a given radius. In order to do that, the complex symbols may belong to a square constellation, such as QAM. Variants of this algorithm exist to deal with PSK constellations, but there is no a single algorithm derivation for dealing with both QAM and PSK constellations.
In lattice pre-processing, the real-domain channel matrix B is decomposed in order to isolate a triangular matrix factor R. Two known algorithms for doing this are based either on (1) the Cholesky decomposition of the Gram matrix BTB as in the original version of SD, or (2) the QR decomposition directly applied to B. Both are different ways of deriving a set of recursive equations to find a solution to the minimization problem in Equation (5).
In lattice search, the SD algorithm includes a set of recursive steps well known to those skilled in the art. If (i) R is an upper square triangular matrix having a size mxm and positive diagonal elements and (ii) y′ is a mx1 vector obtained through a linear filter operation applied through the received vector y (i.e. y′=Ay, with A related to either the QR or Cholesky decomposition), then SD solves the equation:
                              x          ^                =                  arg          ⁢                                    min              x                        ⁢                                                                                                y                    ′                                    -                  Rx                                                            2                                                          (        6        )            restricting the search of sequences x to a sphere of radius C, such as:∥y′−Rx∥2≦C2.   (7)From Equation (7), a set of m inequalities can be obtained, where the bounds used to search for a given coordinate depend upon the values assigned to the previous ones. Proceeding in this way, once the algorithm has a candidate solution for the entire vector x, the radius is updated as the distance from the initial point and the new valid lattice point. If the decoder does not find any point in the constellation within the lower and upper bounds for some xk (assuming coordinates are searched in the order from xm to x1), at least one bad candidate choice has been made for xk+1, kk+2, . . . , xm. The decoder then revises the choice for xk+1 by finding another candidate in its range and proceeds again to find a solution for xk. If no more candidates are available for xk+1, the remaining possible values for xk+2 are examined, and so on. The search ends when no possible points in the sphere remain to be evaluated. On average, the SD algorithm converges at the ML solution by searching for a number of lattice points much lower than the exhaustive ST sequences required by a “brute-force” ML detector.
However, the Sphere Decoder often presents a number of disadvantages. For example, the Sphere Decoder is an inherently serial detector. In other words, it spans the possible values for the I and Q pulse amplitude modulation (“PAM”) components of the QAM symbols successively and thus is not suitable for a parallel implementation. Also, the number of lattice points to be searched is variable and sensitive to many parameters, such as the choice of the initial radius, the SNR, and the (fading) channel conditions. This implies a non-deterministic latency (or equivalently throughput) when applied to a practical implementation. In particular, this means it could be unsuitable for applications requiring a real-time response in data communications, such as in high-throughput 802.11n wireless LANs.
In addition, the need to reduce the size of the search before converging to the ML-approaching transmitted sequence in the Sphere Decoder is not always compatible with the need to find a number of (selected) sequences in order to generate bit soft-output information. For example, if Mc is the number of bits per modulated symbol, the “max-log” approximation of bit LLRs may require finding a minimum of two sequences of X for every bit bk (k=0, . . . , T·Mc), such as one sequence where bk=1 and one sequence where bk=0. By definition, one of the two sequences is the (optimum) hard-decision ML solution. However, there is no guarantee using SD that the other sequence (where the value of the bit under consideration is reversed as compared to the corresponding bit value in the ML sequence) is one of the valid lattice points found by SD during the lattice search. One solution is to build a “candidate list” of points that constitutes a subset of the optimal sequences. However, this solution is approximated and not deterministic, meaning there is no guarantee that the desired sequences will be found unless the candidate list is sufficiently high. This involves a non-negligible trade-off between performance degradation and complexity. Limited simulation results for a soft-output SD have involved very complex iterative combined detection and decoding techniques and a high number of lattice points to be stored in the candidate list (>=512 for T<=4) or a candidate list with thousands of lattice points for 4×4 16 QAM and turbo coded modulation.
Other ML-approaching algorithms include a reduced set search approach, which may not yield good performance below a 10−4 bit error rate (“BER”). Yet another is an approximate method, which may involve a high complexity, and no results have been shown beyond a Quadrature Phase Shift Keying (“QPSK”) constellation.