The present invention relates to an improved signal processing block and more particularly to a processor block for computing QR decomposition of a Channel Matrix for detection/estimation of incoming signals in, for example, a MIMO receiver for communication.
Due to its capability of providing high spectral efficiency and link reliability, multiple-input multiple-output (MIMO) technology has become a key part in many new wireless communication standards, the technology using multiple antennas at both the transmitter and receiver to improve communication performance. However, one of the implementation challenges for MIMO systems is to develop high-throughput low-complexity MIMO receivers and related signal processing blocks.
QR decomposition (QRD) is an essential signal processing task that is utilized in most MIMO detection schemes to decompose an estimated channel matrix into an upper triangular matrix, providing a suitable framework for sequential detection schemes. However, decomposition of complex MIMO channel matrices with large dimensions leads to high computational complexity and hence results in either large core area or low throughput. Moreover, for mobile communication applications that involve fast-varying channels, it is required to perform QRD with low processing latency.
There are several methods for computing the QRD, such as by means of the Modified Gram-Schmidt Ortho-normalization (MGS) algorithm, Householder reflections and Givens rotations. Each has a number of advantages and disadvantages.
Straightforward implementations of the MGS process and Householder reflections require multiplication, division and square-root operations, resulting in a high hardware complexity and computation latency. For MGS, C. Singh, S. Prasad, and P. Balsara, in “VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition,” International Conference on VLSI Design, pp. 836-841, January 2007, propose using log-domain computations to implement these operations using low-complexity adders, subtractors and shifters. However, the solution presented performs frequent data conversions between log and linear domains and it requires large storage space to hold the necessary look-up tables. Large amounts of storage increase either the die area of the solution and hence the cost. In “Complex-valued QR decomposition implementation for MIMO receivers,” in Proc. IEEEICASSP 2008, pp. 1433-1436, April 2008, P. Salmela, A. Burian, H. Sorokin, and J. Takala, propose a low-complexity approximation to implement the inverse square-root function. However, due to the underlying approximation, it might lead to bit error rate (BER) performance degradation, especially for fixed precision arithmetic. Householder reflections have the mathematical advantage of nulling multiple rows simultaneously. However, this benefit comes with a challenging implementation issue when trying to carry out multiple reflections in parallel.
Since Givens rotations work on only two matrix rows at a time, they are more easily parallelized. Furthermore, the Coordinate Rotation Digital Computer (CORDIC) solution, in its vectoring and rotation modes, is usable to perform Givens rotations using low-complexity shift and add operations. These two factors make Givens rotations the method of choice for common QRD implementations with small dimensionality. However, using the conventional sequence of Givens rotations to decompose matrices with large dimensions leads to high computational complexity, due to the large number of required vectoring and rotation operations. To alleviate this problem, a modified sequence of Givens rotations is presented by Y. T. Hwang and W. D. Chen in “A Low Complexity Complex QR Factorization Design for Signal Detection in MIMO OFDM Systems,” in Proc. IEEE ISCAS 2008, pp. 932-935, May 2008 that keeps the block-wise symmetry between the sub matrices intact during the annihilation process. However, this improved sequence still leads to a large number of rotation operations for high-dimensional MIMO systems (e.g., 4×4). Furthermore, the sequential nature of element annihilations for certain sub-matrices and the large number of required rotations for each annihilation causes a throughput bottleneck.