1. Field of the Invention
The present invention generally relates to a multiple input/multiple output system and method for detecting received symbols that can be used in a software defined radio context.
2. Description of the Related Technology
Although Moore's Law predicted a fast evolution of the semiconductor integration, the increment of silicon-capability has been rapidly exhausted by the explosion of signal processing complexity in wireless communications. In recent standards such as 802.11n, WiMAX and 3GPP LTE (long term evolution), the complexity-increments essentially come from the application of MIMO (multiple input multiple output) processing, with which the drastically increased throughput comes at the expense of a very complex MIMO detector. With SDM (spatial division multiplexing) transmissions, the major complexity-increment is in the MIMO detector.
Among existing MIMO detectors, the ML (maximum-likelihood) and near-ML detectors are superior to traditional linear detectors. In recent years, algorithmic optimizations and implementations of ML/near-ML detector have attracted lots of interest. Almost all of the implementations are delivered in ASIC (application specific integrated circuit) or FPGA (field programmable gate array).
With the exploding design and processing cost in the deep submicron era, the current trend is to implement as much possible baseband functionalities on programmable or reconfigurable platforms. Recently, tremendous research efforts have been investigated in both the industry and the academia for parallel programmable baseband architectures targeting mobile terminals.
The SDR (software defined radio) paradigm, which was mainly successful in the base station segment, is currently emerging also in the handset market. Especially, ILP (instruction level parallel) and DLP (data level parallel) architectures are becoming very prevailing. The first one, when implemented as VLIW processor, benefits from mature compilation technologies. With software pipelining, it is possible to achieve very efficient utilizations of parallel functional units. In the paper “Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study” (B. Mei et al., Proc. of DATE 2004, pp. 1224-1229), similar compilation techniques are used to enable the C-language programming on CGA (coarse grain array) processors, bringing even higher parallelism. Theoretically, these latest developments in ILP and DLP architectures would allow the SDR implementation of complex signal processing algorithms such as near-ML detectors, at rates compatible with emerging wireless standards.
Unfortunately, none of the existing near-ML detectors fits well programmable architectures like ILP or DLP. Sphere decoders (see below) and most of its variants are essentially sequential and non-deterministic, so that the parallelization is difficult. On the other hand, although the K-Best, QRD-M (see below) and their variants have been realized in hardware implementations, they have a fundamental problem when mapping on parallel programmable architectures. The spanning-sorting-deleting process incurs irregular dataflow, non-deterministic control flow, extensive shuffling and extensive memory-rearrangement. These characteristics result in very low resource-utilizations on ILP and DLP architectures. If these problems are not eliminated at high-level, low-level compiler optimizations can hardly solve them.
A MIMO system is considered wherein Nt different signals are transmitted and arrive at an array of Nr (Nt≦Nr) receivers via a flat-fading channel. With OFDM (orthogonal frequency division multiplexing) transmission such as that in IEEE 802.11n and 3GPP LTE (long term evolution), frequency-selective channels are converted to parallel flat fading channels. The MIMO detector is arranged to recover a transmitted vector signal from a received vector signal. Popular schemes include linear detection, SIC (successive interference cancelation) and ML/Near-ML detectors. Extensive surveys can be found in “An overview of MIMO communications-a key to gigabyte wireless” (H. Nabar et al., Proc. IEEE. v92, pp. 198-218).
Sphere Decoding (SD) is known for example from the paper “Silicon Complexity for maximum likelihood MIMO detection using spherical decoding” (D. Garrett et al., IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1544-1552, September 2004). It solves the maximum likelihood detection problem by applying the QR (Orthogonal-Triangular) decomposition: H=QR, where Q is an orthogonal matrix and R is an upper triangular matrix.
Various depth-first searching algorithms have been proposed for sphere decoding. Most of these algorithms are depth-first serial tree-search. They are not suited for parallel programmable architectures such as VLIW, as they have a non-deterministic dynamism depending on the channel matrix and the SNR.
The sub-optimal K-Best (similar to QRD-M) and its variants perform breadth-first searching. The K-Best and variants are mostly ASIC-minded algorithms. Both strict sorting and approximating sorting have been proposed. K-Best involves modular and repetitive operations that are easily parallelized in hardware. However, K-Best has many problems on parallel programmable architecture:
(1) extensive shuffling operations;
(2) the execution is not deterministic and regular;
(3) intensive memory rearrangement is required;
(4) the complexity of the spanning-sorting-deleting process is still too high.
Consequently, there is a need for further improvement when implementing near-ML detectors on programmable architectures.