Multiple-input-multiple-output (MIMO) systems have received significant attention as a promising method for achieving large spectral efficiency, which makes it the technology of choice in many standards such as IEEE802.11n, IEEE802.16e/m, and IEEE 802.20. One of the main challenges in exploiting the potential of MIMO systems is to design low-complexity, high-throughput detection schemes, which are suitable for efficient VLSI realization, to implement low-power MIMO receivers with near-maximum-likelihood (ML) performance.
In spatial multiplexing (SM) multiple-input multiple-output (MIMO) schemes with NT transmit and NR receive antennas (NT×NR system), where NR≧NT, NT data streams are transmitted simultaneously from NT antennas. This results in an increase in the system's spectral efficiency by a factor of NT assuming that the data streams can be successfully decoded. The mathematical model for a SM MIMO system is:y=Hs+n  (1)
where y is a NR×1 received vector, with NR equal to the number of receive antennas, H is the NR×NT channel matrix, s is the transmit vector, and n is a NR×1 received noise vector. The average signal-to-noise ratio (SNR) of all NT streams has to be maintained without increasing the total transmit power compared to single-antenna systems. In fact in order to achieve the maximum spectral efficiency, the interference resulted from simultaneous transmission of NT data streams has to be suppressed at the receiver using a MIMO detection scheme. The optimum detector, achieving the full diversity order of NR, is the maximum-likelihood (ML) detector, which finds the transmitted symbol vector via solving the following optimization problem.ŝ=argmins∥y−Hs∥2.  (2)
where ŝ represents the optimal detected symbol at the receiver.
This optimization problem is computationally expensive to implement specially for high-order constellation schemes and/or MIMO systems with large number of transmit antennas. This is because of the fact that it requires an exhaustive search over all QNT possible input vectors, where Q is the modulation level. For instance in a MIMO system with only two transmit antennas using the 64-QAM modulation scheme, there are total of 642=4096 symbol vectors to search through. The main downside of the ML detector is the fact that its complexity grows exponentially with the modulation level. Thus the goal is to design an optimal detector with the exact ML performance, while having a linear complexity with respect to the modulation level and independent of the SNR and channel status.
On the other hand, the complexity of the exhaustive-search optimal ML detection scheme grows exponentially with the number of transmit antennas. Therefore, lower-complexity suboptimal receivers are required to be developed in practical applications. The existing approaches used to alleviate the high computational complexity of the ML detector fall into the following two main categories:
Linear Receivers:
Zero-forcing and Minimum Mean Square Error (MMSE) receivers are the most common low-complexity candidates, which are able to remove the spatial interference between the transmitted data streams with a linear complexity. However, the achieved diversity order with a linear receiver is NR−NT+1. This means in a 2×2 MIMO system, there is no diversity gain, which results in a significant performance loss compared to the ML receiver.
Suboptimal ML Receivers:
which are lower-complexity approximations of the ML detector with a close-to-ML performance. The lower complexity is as a result of choosing a limited search space compared to the ML exhaustive search. As a consequence, the optimal ML solution may not be included in the search space, which generates the performance loss. However, in general these methods outperforms the linear receivers. Depending on the non-exhaustive search methodology, the suboptimal algorithms fall into two main categories, namely the depth-first methods, and breadth-first methods.
Sphere decoding (SD) is the most attractive depth-first approach whose performance is the same as ML under the assumption of unlimited execution time. However, the actual runtime of the algorithm depends not only on the channel realization/status, but also on the operating SNR. Thus leading to a variable throughput rate resulting in an extra overhead in the VLSI implementation due to the extra required I/O buffers and lower hardware utilization.
Among the breadth-first search methods, the most well-known approach is the K-Best algorithm. The K-Best algorithm guarantees a SNR-independent fixed-throughput detector with a performance close to the ML. Being fixed-throughput in nature along with the fact that the breadth-first approaches are feed-forward detection schemes with no feedback, makes them especially attractive for the hardware implementation. There has been some efforts on the implementation of the K-Best algorithm, however, the K-Best algorithm consists of node expansion and sorting cores, which are both time-hungry and the bottleneck in the hardware resulting in low-throughput architectures. Moreover, their performance also deteriorates for high-SNR regimes.
Therefore, there is a crucial need for a detector, which has the optimal performance of the ML detector, the high-speed feature of the depth-first approaches, and the SNR-independent fixed-throughput architecture of the breadth-first schemes.