This invention generally relates to solving linear systems. In particular, the invention relates to using array processing to solve linear systems.
Linear system solutions are used to solve many engineering issues. One such issue is joint user detection of multiple user signals in a time division duplex (TDD) communication system using code division multiple access (CDMA). In such a system, multiple users send multiple communication bursts simultaneously in a same fixed duration time interval (timeslot). The multiple bursts are transmitted using different spreading codes. During transmission, each burst experiences a channel response. One approach to recover data from the transmitted bursts is joint detection, where all users data is received simultaneously. Such a system is shown in FIG. 1. The joint detection receiver may be used in a user equipment or base station.
The multiple bursts 90, after experiencing their channel response, are received as a combined received signal at an antenna 92 or antenna array. The received signal is reduced to baseband, such as by a demodulator 94, and sampled at a chip rate of the codes or a multiple of a chip rate of the codes, such as by an analog to digital converter (ADC) 96 or multiple ADCs, to produce a received vector, r. A channel estimation device 98 uses a training sequence portion of the communication bursts to estimate the channel response of the bursts 90. A joint detection device 100 uses the estimated or known spreading codes of the users' bursts and the estimated or known channel responses to estimate the originally transmitted data for all the users as a data vector, d.
The joint detection problem is typically modeled by Equation 1.Ad+n=r  Equation 1d is the transmitted data vector; r is the received vector; n is the additive white gaussian noise (AWGN); and A is an M×N matrix constructed by convolving the channel responses with the known spreading codes.
Two approaches to solve Equation 1 is a zero forcing (ZF) and a minimum mean square error (MMSE) approach. A ZF solution, where n is approximated to zero, is per Equation 2.d=(AHA)−1AHr  (Equation 2A MMSE approach is per Equations 3 and 4.d=R−1AHr  Equation 3R=AHA+σ2I  Equation 4σ2 is the variance of the noise, n, and I is the identity matrix.
Since the spreading codes, channel responses and average of the noise variance are estimated or known and the received vector is known, the only unknown variable is the data vector, d. A brute force type solution, such as a direct matrix inversion, to either approach is extremely complex. One technique to reduce the complexity is Cholesky decomposition. The Cholesky algorithm factors a symmetric positive definite matrix, such as Ã or R, into a lower triangular matrix G and an upper triangular matrix GH by Equation 5.Ã or R=G GH  Equation 5A symmetric positive definite matrix, Ã, can be created from A by multiplying A by its conjugate transpose (hermetian), AH, per Equation 6.Ã=AHA  Equation 6For shorthand, {tilde over (r)} is defined per Equation 7.{tilde over (r)}=AHr  Equation 7As a result, Equation 1 is rewritten as Equations 8 for ZF or 9 for MMSE.Ãd={tilde over (r)}  Equation 8Rd={tilde over (r)}  Equation 9To solve either Equation 8 or 9, the Cholesky factor is used per Equation 10.G GHd={tilde over (r)}  Equation 10A variable y is defined as per Equation 11.GHd=y  Equation 11Using variable y, Equation 10 is rewritten as Equation 12.Gy={tilde over (r)}  Equation 12The bulk of complexity for obtaining the data vector is performed in three steps. In the first step, G is created from the derived symmetric positive definite matrix, such as Ã or R, as illustrated by Equation 13.G=CHOLESKY(Ã or R)  Equation 13Using G, y is solved using forward substitution of G in Equation 8, as illustrated by Equation 14.y=FORWARD SUB(G, {tilde over (r)})  Equation 14Using the conjugate transpose of G, GH, d is solved using backward substitution in Equation 11, as illustrated by Equation 15.d=BACKWARD SUB(GH,y)  Equation 15An approach to determine the Cholesky factor, G, per Equation 13 is the following algorithm, as shown for Ã or R, although an analogous approach is used for R.
for i = 1 : N for j = max(1, i − P) : i − 1  λ = min(j + P, N)  aI λ, I = aI λ, I − a*I, J · aI λ, J; end for; λ = min(i + P, N) aI λ, I = aI · λ, I /aii;end for;G = Ã or R;ad,e denotes the element in matrix Ã or R at row d, column e. “:” indicates a “to” operator, such as “from j to N,” and (·)H indicates a conjugate transpose (hermetian) operator.
Another approach to solve for the Cholesky factor uses N parallel vector-based processors. Each processor is mapped to a column of the Ã or R matrix. Each processor's column is defined by a variable μ, where μ=1:N. The parallel processor based subroutine can be viewed as the following subroutine for μ=1:N.
j = 1 while j < μ  recv(g, N, left)  if μ < N   send(gJ N, right)  end  aμ N ,μ = aμ N ,μ − g*μgμ.N  j = j + 1 end aμ N ,μ = aμ N ,μ / √{square root over (aμμ)}if μ < N  send(aμ · N ,μ, right)endrecv(·, left) is a receive from the left processor operator; send(·, right) is a send to the right processor operator; and gK,L is a value from a neighboring processor.
This subroutine is illustrated using FIGS. 2a–2h. FIG. 2a is a block diagram of the vector processors and associated memory cells of the joint detection device. Each processor 501 to 50N (50) operates on a column of the matrix. Since the G matrix is lower triangular and Ã or R is completely defined by is lower triangular portion, only the lower triangular elements, ak,1 are used.
FIGS. 2b and 2c show two possible functions performed by the processors on the cells below them. In FIG. 2b, the pointed down triangle function 52 performs Equations 16 and 17 on the cells (aμμ to aNμ) below that μ processor 50.v←aμN,μ/√{square root over (aμμ)}  Equation 16aμ:N,μ:=v  Equation 17“←” indicates a concurrent assignment; “:=” indicates a sequential assignment; and v is a value sent to the right processor.
In FIG. 2c, the pointed right triangle function 52 performs Equations 18 and 19 on the cells below that μ processor 50.v←u  Equation 18aμ:N,μ:=aμN,μ−vμvμN  Equation 19vk indicates a value associated with a right value of the kth processor 50.
FIGS. 2d–2g illustrate the data flow and functions performed for a 4×4 G matrix. As shown in the FIGS. 2d–2g for each stage 1 through 4 of processing, the left most processor 50 drops out and the pointed down triangular function 52 moves left to right. To implement FIGS. 2d–2g, the pointed down triangle can physically replace the processor to the right or virtually replace the processor to the right by taking on the function of the pointed down triangle.
These elements are extendable to an N×N matrix and N processors 50 by adding processors 50 (N—4 in number) to the right of the fourth processor 504 and by adding cells of the bottom matrix diagonal (N—4 in number) to each of the processors 50 as shown in FIG. 2h for stage 1. The processing in such an arrangement occurs over N stages.
The implementation of such a Cholesky decomposition using either vector processors or a direct decomposition into scalar processors is inefficient, because large amounts of processing resources go idle after each stage of processing.
Accordingly, it is desirable to have alternate approaches to solve linear systems.