Multiple input/multiple output (MIMO) antenna deployment has become one of the most important techniques for enhancing system capacity and improving receiver performance in fourth generation (4G) high-speed wireless communication, under the Long Term Evolution (LTE) standard and the IEEE 802.16 WiMAX standard.
Signals received by each of the antenna ports are converted to a digital signal stream. Each digital signal stream is synchronized, having the carrier frequency offset removed, digitally and automatically gain controlled, scaled, and associated to an estimated channel coefficient (H). The digital signal streams from the multiple antennas are then equalized through a complex-valued mathematical operation, namely, a matrix inversion.
The complexity of the matrix inversion is related to the MIMO antenna's dimension (i.e., number of antennas used). An increase in the MIMO antenna's dimension improves system performance by increasing data throughput, but comes with an increased challenge on the algorithm, which performs the matrix inversion. The matrix inversion may be performed through software-oriented and hardware-oriented matrix inversion algorithms, where each has its advantages and shortcomings.
Software-oriented matrix inversion algorithms, such as the Gaussian, Cofactor and Blockwise algorithms may be implemented via programmed instructions on a digital signal processor (DSP). Software oriented matrix inversion algorithms are flexible due to the programmable nature of the algorithms. These algorithms are able to achieve a high degree of precision. However, the software algorithms are not suitable for a large antenna matrix size, because significant computing power is required.
A hardware oriented matrix inversion algorithm approach uses hard coded pipeline processing elements (PE) to perform complex valued matrix inversions. One well-known hardware oriented matrix inversion application utilizes a systolic array to perform matrix inversion on the multiple input digital signal streams. It is well known in the art that a systolic array processes signals in both the horizontal direction (West to East) and in the vertical direction (North to South) simultaneously.
FIGS. 1A and 1B illustrate an application of a systolic array (150) performing a 4×4 matrix inversion on LTE digital signal streams (132a to 132d). More specifically, FIG. 1A illustrates that the systolic array (150) may carry out a known (QR Decomposition) QRD based matrix inversion algorithm in two stages, namely, QR decomposition operations in the first triangular processing block (120), back-substitution (BS) and back-substitution delay (BSD) operations in the second triangular processing block (122). A total of 16 processing elements (PEs) and four delay units (−1A to −1D) (see FIG. 1A) may be employed in both processing blocks (120, 122) of the systolic array (150). The PEs for the QRD may be designated as QR-PE. The PEs for the back-substitution may be designated as BS-PE, and the PE for the back-substitution delay may be designated as BSD-PE. Each of the processing elements (PEs) typically implements four CORDIC cores (not shown), arranged as two series cascaded CORDIC cores in parallel arrangement. The structure and function of a CORDIC core is well known in the art.
Referring to both FIGS. 1A and 1B, input digital signal streams (132a to 132d) are fed as input matrix A into the systolic array (150) one column at a time at a constant rate. After the 14th step, the first element of the inversed matrix A−1 is output from the systolic array (150) as output streams (142a to 142d).
FIG. 1B illustrates that the input matrix (106) is deliberately skewed due to the fact that every PE of a systolic array (150) needs to be triggered by two inputs (i.e., from the West and from the North directions) simultaneously. In addition, a PE may function as a memory node for the parameters from the last matrix. Therefore, starting zeros (102) and ending zeros (104) are inserted to the matrix to ensure that every PE is properly reset between two consecutive column matrix inputs (106a, 106b). In addition, four delay units (−1A to −1D) (see FIG. 1A) are used in the systolic array (150) to synchronize the consecutive column matrix inputs (106a, 106b) to each other. Accordingly, the output data streams (142a to 142d) as inversed matrix (e.g., 116) from the systolic array (150) are also skewed by the same starting zeros (112) and ending zeros (114), respectively.
It may be pointed out that multiple users in MIMO (MU-MIMO) require matrix inversions to be performed in parallel. In this regard, performing 2×2 matrix inversions in parallel (not shown) for MU-MIMO may require four independent systolic arrays, which may contain a total of 4 QR-PEs, 4 BS-PEs, 8 BSD-PEs and 8 delay units (not shown). When configuring a 16 PEs systolic array (150) of 4×4 matrix inversion (e.g., see FIG. 1A) to perform a 2×2 matrix inversion, only two systolic arrays may be achieved from the 16 PEs systolic array, thus wasting a large portion of the QR-PEs and BS-PEs. In addition, the latency and the power consumption of running all 64 CORDIC cores may not reduce significantly when performing the 2×2 matrix inversion.
Therefore, even though the above systolic array (150) hardware matrix inversion algorithm approach has the advantages of high processing speed and high throughput, it nevertheless has at least the following disadvantages: (1) inflexible architecture, which cannot be scaled or configured to adapt to systems using different MIMO dimensions (i.e., having more or fewer antennas); (2) fixed precision, which cannot be configured to achieve a higher or lower precision based on system architecture or performance requirement; (3) high latency, the high number of CORDIC cores used and number of iterations required for the inversion algorithm, and the need of synchronizing input matrix columns by inserting starting zeros (102) and ending zeros (104) to reset memories causes processing delays, especially when the matrix size increases for large MIMO dimensions; (4) increase area size and power consumption due to high number of CORDIC cores in each PE, especially when the matrix size increases for large MIMO dimensions.