Consider any linear channel of the form y=Hx+n where H is an M×N matrix of channel gains, x is an N×1 vector of channel inputs, and n is an M×1 vector of independent identically distributed (IID) complex Gaussian noise variables with mean 0 and covariance matrix N0I.
We are interested in the problem of soft-input, soft-output detection of the vector x. This can be readily done through the Bahl-Cook-Jelinke-Raviv (BCJR) algorithm, but the complexity per detected symbol is O(|Ω|K) where Ω is the modulation alphabet and K is the memory of the channel matrix H. The memory is a central theme and we will next define what we mean by this term. Let us do a QL factorization of the channel matrix H, so that we write H=QL. Then, without loss of generality, we can define the filtered observed vector r=Q*y=Lx+w, where w≡n in distribution. Now we make the following definition of the memory of any channel H.                Definition 1: If the non-zero elements in L are confined to the first K+1 diagonals of L, then the memory of H is K, and optimal demodulation has complexity O(|Ω|K).        
Note that we have made no claims what the channel matrix H may represent. This innovation assumes an arbitrary H, so that it encompasses, e.g., inter-symbol interference (ISI) channel (e.g., encountered for example in satellite transmission), multiple-input multiple-output (MIMO) (e.g., encountered for example in the Long Term Evolution (LTE) downlink), MIMO-ISI (e.g., encountered for example in the LTE uplink), inter-channel-interference (ICI) (e.g., encountered for example in the LTE downlink with high Doppler spread), etc. The reader can with benefit keep the ISI example in mind, where the memory K is simply the number of taps minus one of the channel impulse response.
Another central theme within information and communication theory is channel capacity. This is defined as the highest possible rate that can be carried through the channel with zero error rate. As we have not made any definition of the concept of “time”, we will measure channel capacity as the nats per channel use that can be reliably transmitted. Achieving capacity requires an optimization over the input distribution of x and this must be matched to the actual communication channel H (via waterfilling), but in this innovation we assume that no knowledge of H is available at the transmitter so that such optimization is not possible. Then, the word capacity is strictly speaking not correct, but we will stick with the term although it is a slight abuse of notation.                Fact 1: The capacity of the linear Gaussian vector channel, measured in nats per channel use is        
      C    =          log      ⁢                          ⁢              det        (                  1          +                                    P              ⁢                                                          ⁢                              HH                *                                                    N              0                                      )              ,where the inputs x are distributed as CN(0, PI).
This capacity can be rewritten as,                Fact 2: Straightforward manipulations using the chain rule of mutual information yields        
      C    =                  ∑                  n          =          1                N            ⁢              I        ⁡                  (                                                    x                n                            ;                              y                |                                  x                                      n                    -                    1                                                                        ,            …            ⁢                                                  ,                          x              1                                )                      ,where I(x; y) is the standard mutual information operator.
The meaning of the memory concept can be included into Fact 2, in order to obtain,                Fact 3: If the memory of the channel H is K, then        
      C    =                  ∑                  n          =          1                N            ⁢              I        ⁡                  (                                                    x                n                            ;                              y                |                                  x                                      n                    -                    1                                                                        ,            …            ⁢                                                  ,                          x                              n                -                K                                              )                      ,where xk={φ}, k≦0.
Finally, the following result is well known,                Fact 4: With an optimal detector of x given y, the capacity of the channel can be reached.        
Let us summarize: to every linear channel H, there is an associated memory and a channel capacity. The channel capacity can be reached if the receiver is making use of an optimal detector for x, and this detector has a complexity that is exponential in the memory K. Note that our measure of receiver complexity assumes discrete inputs while the capacity requires Gaussian distributed inputs. However, the Gaussian input capacity typically represents the discrete input limit very well up to a certain signal-to-noise ratio (SNR) threshold that depends on the cardinality of the inputs. Therefore, the value C is of operational interest even if the system is using discrete inputs rather than Gaussian.
Problems arise if the channel memory K is big so that the complexity |Ω|K is beyond the allowed complexity budget. This is a common situation in practice and reduced complexity techniques must therefore be looked into. A few examples are for example the LTE link where the channel memory is often 10 or so, and with 64-quadrature amplitude modulation (QAM) inputs this yields a complexity 260 per transmitted symbol. This is one of the reasons why orthogonal frequency-division multiplexing (OFDM) is typically preferred over single carrier in LTE. Another example is the Global System for Mobile Communications (GSM) link where the memory of the channel is around 7-10. In GSM, contrary to the LTE case, this memory is indeed handled by a reduced trellis based receiver.
The amount of research that has been devoted to constructing a low-complexity, yet high performance, receiver solution for large memories is massive. One such group of receivers is the channel shortening (CS) receivers. The rationale behind the CS receiver is the following:
Filter a received signal y with a pre-filter W, r=Wy.
The aim of W is to “compress” the memory of channel H to a smaller value.
The effective channel is now T=WH and this has by assumption a smaller memory than K, denote it by L. We now have r=Wy=Tx+(Wn), where Wn is filtered noise.
Apply the BCJR algorithm to the signal r where the memory is L by definition.
The outcome of steps 1-4 is near-optimal detection, but with complexity O(|Ω|L) instead of O(|Ω|K).
How “near-optimal detection” we really have is determined by the particular choices of W and T, respectively.
It is reminded that we have a pure matrix-valued notation here. In the case of an ISI channel, the channel matrix H becomes a Toeplitz matrix representing a convolution. Then the filter W is also a convolution matrix which is well known.
This scheme of CS receivers was invented already in 1973 by Falconer and Magee, and they gave an explicit choice of the two CS parameters W and T. A block diagram of the CS idea is provided in FIG. 1. The order of operations carried out in CS is illustrated in FIG. 1: (i) Based on the channel matrix H, the noise density N0 and the memory of the BCJR L, the pre-filter W and the effective channel T is computed; (ii) The received signal y is then filtered by the pre-filter W which produces the vector r. (iii) A BCJR, where the BCJR operations are specified by the matrix T, is finally applied on the vector r. Note that if we set L=0, then the standard zero-forcing equalizer and the minimum mean square error (MMSE) equalizer falls within the CS framework.