In communications systems, signals transmitted, wirelessly for example, may be subjected to fading, jamming, and other elements that may cause errors to be introduced in the signal. The coding of signals before transmission helps to overcome the effects of channel noise, fading, and jamming, by allowing errors introduced during the transmission to be detected and corrected when the signal is decoded at a receiver.
Parallel concatenated convolution codes (PCCC) or “Turbo codes” have been recognized as a breakthrough in coding schemes. Turbo codes provide powerful resistance to errors generated during transmission. Turbo codes provide high coding gains and bit error rates as low as 10−7. Because turbo codes provide outstanding error correction, turbo codes are very useful in applications where the signal-to-noise ratio (SNR) is generally low (e.g., wireless communications).
A turbo encoder may include a parallel concatenation of two recursive systematic convolutional (RSC) encoders linked by an interleaver. The two RSC encoders provide the component codes of a turbo code. The interleaver changes the order of the data stream before it is input to the second RSC encoder. Because one data stream is interleaved, the resulting codes have time-variant characteristics that provide for the high coding gains obtained from turbo coders.
A serial turbo decoder may include a pair of soft-input, soft output (SISO) decoders, a receiver buffer, an interleaver, and a deinterleaver. In operation, an incoming block of data (also called a data frame) is processed once and then recirculated several times to achieve a desired coding gain. Although turbo codes exhibit high resistance to errors, they are not ideally suited for many practical applications because of an inordinately high latency that is a result of the turbo encoder's use of interleavers (which introduce delay) and the turbo decoder's iterative algorithm which is computationally complex. Turbo codes usually work with large block sizes (e.g. N>5000 bits). The soft inputs for an entire block must be stored in a memory in order to facilitate the iterative decoding. In other words, the soft inputs will be repetitively used in each decoding phase. As a result, turbo decoders are memory intensive, which may render them impractical or too expensive for many applications.
In general, latency of serial turbo decoders may be marginally improved by using specially designed high-speed hardware to implement the turbo decoders; however, only incremental improvement in latency is provided at the cost of increased expense and device complexity, in addition to increased power dissipation (which may be unacceptable in many low power wireless devices).
An alternative approach to overcoming the high latency of turbo decoding is to use parallel decoding architectures. Parallel decoding can greatly improve throughput and latency. Two basic parallel schemes are available. Parallelism may be achieved by decoding multiple received signals at the same time or by dividing a received signal into blocks and decoding the blocks in parallel. While throughput and latency may be reduced using parallel decoding, the large memory requirement is not. In addition, hardware complexity and cost also are increased. Therefore, parallel schemes that are memory efficient and hardware (or area) efficient are needed for practical implementation of turbo codes.
An example of a communications system 100 of the type shown in FIG. 1. As is conventional, the communications system 100 includes a transmitter 101 that may be used to send signals to a receiver 102 through a transmission medium including a communications channel 105. In transmitter 101, a source encoder 111 removes redundant parts of a signal for transmission. An encryptor 113 may be used to encrypt the signal ensuring privacy of the information transmitted in the signal. The encrypted signal 114 is provided to the turbo encoder 115, discussed previously, which encodes the signal to protect against perturbations introduced by the communications channel 105. The encoded signal 116 is supplied as an input to modulator 117. The modulator 117 modulates the encoded signal for suitable transmission across the channel 105.
In the receiver 102, a demodulator 122 demodulates the signal received from the channel 105. The demodulated signal is provided to the turbo decoder 124, discussed previously. The turbo decoder 124 decodes the signal, checks the signal for errors during transmission, and corrects for errors if possible. The decoded signal may be sent to a decryptor 126 if the signal is encrypted. Finally, the signal is decoded by a source decoder 128.
An exemplary conventional encoder, shown in FIG. 2A, may be used to implement the encoder 115 shown in FIG. 1. The turbo encoder 115 may include two recursive systematic convolutional (RSC) encoders 201, 203, and an interleaver 205. The interleaver 205 may be implemented using a conventional block interleaver or a random inter-leaver. The interleaver works on block-based data. Once a block of data is received in sequence, the interleaver outputs the data in a perturbed order. The inter-leaver 205 changes the order of the data stream 114 or uk before it is input to the second RSC 203. Because one data stream is interleaved, the resulting codes have time-variant characteristics that provide for the high coding gains obtained from turbo coders.
The turbo encoder 115 receives input signal 114 uk. The systematic bit xks at time k is one output of the encoder. The RSC encoder 201 encodes bits in the signal uk in an original order. The second RSC encoder 203 encodes an interleaved information sequence received from interleaver 205. For each bit xks at time index k, the first RSC encoder 201 generates parity bit xkp1 and the second RSC encoder 203 generates parity bit xkp2. Parity bits xkp1 and xkp2 may be punctured (i.e., deleted from the output data stream) before sending to the modulator 117 according to a desired coding rate. The outputs xks, xkp1, and xkp2 form the output 116 to the modulator 117. The output bits are then transmitted over a communication channel using a variety of transmission techniques as described in “Digital Communications,” by John Proakis, McGraw-Hill, 1983, ISBN 0-07-050937-9.
The turbo decoder 124 of FIG. 1 may be implemented using a conventional serial turbo decoder 124 shown in FIG. 2B. Signal 123 input to turbo decoder 124 may include bits yks, ykp1, and ykp2, which correspond to bits xks, xkp1, and xkp2 produced by turbo encoder 115 (see FIG. 2A). The turbo decoder 124 includes two soft-input soft-output (SISO) decoders 221, 222, two interleavers 231, 233, and one deinterleaver 240. Each decoder 221 and 222 has two outputs at each time instant: (1) extrinsic information, denoted as Lexi(k) where k represents the time and i corresponds to the first or second SISO decoder, (2) log likelihood ratio (LLR), denoted as Liri(k) or LRi(k). The extrinsic information output from one constituent decoder is used as a priori information for the other constituent decoder after interleaving/de-interleaving. The MAP algorithm may be used to compute the soft outputs. The decision bits (uk=+1 or uk=−1) are determined depending on the signs of the LLR values.
A maximum a-posteriori (MAP) algorithm may be used to implement the SISO decoders 221 and 222 in turbo decoder 124. A detailed discussion of the MAP algorithm and its derivation may be found in “Near optimum error correcting coding and decoding: turbo codes” by C. Berrou et al., IEEE Tran. on Communications, vol. 44, pp. 1261–1271, October 1996 and “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain” by P. Robertson et al., IEEE Int. Conf. on Communications, pp. 1009–1013, 1995.
If R1N=(R1, R2, . . . , Rk, . . . , RN) denotes a received noise corrupted signal and Sk denotes the state of the encoder at time k, respectively, using Bayes rule (and taking into account events after time k are not influenced by observation of R1k and bit uk), if state Sk is known, LLR of uk can be derived as
                                          L            R                    ⁡                      (                          u              k                        )                          =                  log          ⁢                                                    ∑                m                            ⁢                                                ∑                                      m                    ′                                                  ⁢                                                                            α                                              k                        -                        1                                                              ⁡                                          (                                              m                        ′                                            )                                                        ·                                                            β                      k                                        ⁡                                          (                      m                      )                                                        ·                                                            γ                      1                                        ⁡                                          (                                                                        y                          k                                                ,                        m                        ,                                                  m                          ′                                                                    )                                                                                                                          ∑                m                            ⁢                                                ∑                                      m                    ′                                                  ⁢                                                                            α                                              k                        -                        1                                                              ⁡                                          (                                              m                        ′                                            )                                                        ·                                                            β                      k                                        ⁡                                          (                      m                      )                                                        ·                                                            γ                      0                                        ⁡                                          (                                                                        y                          k                                                ,                        m                        ,                                                  m                          ′                                                                    )                                                                                                                              (        1        )            
where the forward recursion metric α, the backward recursion metric β and the branch metric γi are defined asαk(m)=P(Sk=m|R1k)  (2)
                                          β            k                    ⁡                      (            m            )                          =                                                                              P                  (                                      R                                          k                      +                      1                                        N                                                                    ⁢                                  S                  k                                            =              m                        )                                                              P                (                                  R                                      k                    +                    1                                    N                                                            ⁢                              R                1                k                                      )                                              (        3        )            γi(Rk,m,m′)=P(uk=i,Sk=m,Rk|Sk−1=m′)  (4)
respectively.
In prior applications of the MAP algorithm to turbo decoding, a frame of received data is decoded by first calculating the forward recursion metrics (x across the entire frame of data. The results for each of the forward recursion metric α computations are saved in a memory. After completing the forward recursion metric α computations, the backward recursion metrics β are computed beginning at the end of the data-frame. After each backward recursion metric β calculation is completed an LLR computation may be performed using the corresponding saved forward recursion metric α. This direct implementation of the MAP algorithm is referred to as the global recursion approach.
A drawback of using the MAP algorithm is that a large amount of memory is required to decode a block of data (i.e., a frame) since all of the forward recursion metrics α are stored from the starting trellis stage (i.e., the beginning of the data block) to the end trellis stage (i.e., the end of the data block) when the backward recursion metrics β recursive computation begins. In addition, another drawback is the resulting long decoding delay (or latency) since LLR calculations cannot be made until the first backward recursion metric β has been determined. For example, according to the global recursion approach if a 4-state turbo code of frame length (N)=1024 bits, having a finite word length of 9 bits used to represent each state metric, the storage necessary to decode an entire frame would be a total of 4×1024×9=36,864 bits.
One known technique to reduce latency and the memory requirements associated with the global recursion approach is the sliding window approach. The sliding window approach initializes each state as equi-probable. The recursion operations are continued for a number of trellis stages. As a result, the state metrics at the last time index may be assumed to be reliable.
The sliding window approach may be implemented using different versions. As shown in FIG. 3, one version includes a timing diagram for the sliding window approach is shown for decoding a data-block or frame 300. The data block or frame 300 includes a number of sliding windows or sub-blocks of data (e.g., 301, 302, 303, 304, 305, 306, 307, 308, 309, and 310). The sliding window length is normally chosen to be 5 to 7 times the constraint length (e.g., the memory size plus one) of RSC encoders. In this example, two backward recursion units (performing a pre-computation backward recursion metric β1 computation 310 and backward recursion metric β2 computation 312) and one forward recursion unit α (performing a forward recursion metric computation 313) are employed for computing the state metrics. It may be assumed that the first two time slots (2 Tsw) are used for computing the branch metrics for the first two sliding windows, where the time slot Tsw denotes the processing time for a sliding window within one decoding phase. It also may be assumed that the trellis starts and ends on the same known state, such as, for example, State 0.
At the beginning of the third time slot t3 330, the a unit starts forward recursive computation 313 on the first sliding window 301. The pre-computation β1 unit starts backward recursive computation 310 on the second sliding window 302 with a constant initial value assigned to all states. The branch metrics are computed for the 3rd sliding window in the meantime. By the end of the period t3, the computed forward state metrics are stored in a small buffer and will be used for computation of LLR/Lex in the next time slot. The pre-computation backward state metrics at the last time index within a time slot are used as the initial values for the computation of the backward state metrics in the next time slot.
During the fourth time slot t4 335, the backward recursion metric β2 unit starts its backward recursion computation 312 with the initial values from the pre-computation backward recursion metrics determined during t3. At each decoding cycle (or time period), the previously saved forward state metrics at time index k−1 and presently computed backward state metrics at time index k are used for computation of the outputs LLR and extrinsic information Lex at time index k. During the fourth time slot 335, the LLR and extrinsic information are computed for the bits in the first sliding window 301. In the meantime, the α unit continues its forward recursion computations 313 with the initial values equal to the forward state metric at the last time index. This process is repeated until the end of a frame 300 is reached.
If it is assumed that there is a total of S sliding windows in a frame, at the (S+3)−rdTsw, the β2 unit starts its backward recursion from the ending stage. The outputs LLR and extrinsic information of the bits in the last sliding window 310 are output during this time period.
Using the sliding window approach, only the forward state metrics within one sliding window need to be stored. This represents a dramatic savings in memory over the global recursion approach which requires that the entire state metrics must be stored out to the last sub-block at which point backward recursion computations begin.
The sliding window length may be chosen to be 5 to 7 times of the constraint length (K) of the code. For example, for a 4-state (K=3) turbo code, the sliding window size may be chosen to be 16 bits for the benefit of, for example, VLSI implementation. The required memory is 16×4×9+1×4×9=612 bits. This is a substantial improvement over the 36,864 bits required for the global recursion approach. In addition, the performance degradation of using the sliding window approach is negligible when compared with the global recursion approach. Moreover, the latency for the sliding window approach is reduced to 3 decoding cycles instead of S decoding cycles in the global recursion approach, where S generally may be much larger than 10.