The Turbo Code is an error correcting code (ECC) commonly used in wireless communications systems, one example being in the physical layer of the 3GPP standards for wireless cellular communications. Turbo codes are chosen due to their robustness, efficiency and relative ease of implementation. Reference may be made to the following.
[Reference 1] “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes” by: C. Berrou, A. Glavieux, P. Thitimajshima. Communications, 1993. ICC 93. Geneva. Technical Program, Conference Record, IEEE International Conference on, Vol. 2 (1993), pp. 1064-1070 vol. 2.
[Reference 2] 3GPP TS 25.212 (Multiplexing and channel coding) section 4.2.3.2 (Turbo Coding) and 4.2.7 (rate matching).
[Reference 3] Turbo-coding and puncturing interactions on HS-DSCH in R5 HSDPA. Document #R1-030444 for discussion at 3GPP TSG-RAN Working Group 1 Meeting #32, Paris, France May 19-23 2003.
[Reference 4] An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes. by A. J. Viterbi. IEEE J. Sel. Areas Commun., vol. 16, no. 2, pp. 260-264, February 1998.
[Reference 5] Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. by L. R. Bahl, J. Cocke, F. Jelinek, J. Raviv. IEEE Transactions on Information Theory, IT-20, pp. 284-287, March 1974.
[Reference 6] U.S. Pat. No. 5,933,462: “Soft decision output decoder for decoding convolutionally encoded codewords”, Andrew J. Viterbi et al
[Reference 7] US patent application publication no. 2006/0067229: “Transmitting data in a wireless network”, Frank Frederiksen.
The operation of a Turbo code is based on the parallel or serial concatenation of two component codes separated by an interleaver. The component codes can be systematic or non-systematic, depending on the presence of the input information bits in the encoded output stream. The following description refers to the case of a parallel concatenation of systematic convolutional codes.
A Turbo Code works by redundantly encoding two differently ordered (interleaved) copies of the data to be sent. Some information (not necessarily all) from both encoders is then transmitted over a noisy channel. FIG. 1 schematically illustrates a transmitter 100 comprising a Turbo encoder 102 and associated puncturing module 108. The Turbo encoder 102 comprises a first constituent encoder 104a, a second constituent encoder 104b, and an interleaver 106 with its output coupled to the input of the second constituent encoder 104b. 
The first constituent encoder 104a forms a first branch arranged in parallel with a second branch formed of the second constituent encoder 104b and its associated interleaver 106, such that the inputs of the first constituent encoder 104a and of the interleaver 106 are each arranged to receive the same input data S. This input data S comprises the information bits, i.e. the actual data desired to be transmitted. The input data S is preferably supplied inf “systematic” form, which means it either is or contains the actual data bits in an explicit or verbatim form. Preferably, it may directly contain the actual data bits plus additional check-sum bits e.g., in the presence of a cyclic redundancy check (CRC) code.
The interleaver 106 re-orders the systematic input data S and supplies the re-ordered systematic data to the input of the second constituent encoder 104b. Thus the second constituent encoder 104b encodes a differently ordered version of the same data S as encoded by the first constituent encoder 104a. The constituent encoders 104a and 104b are preferably convolutional encoders which output parity bits for use in error correction at the decoder 115 (discussed shortly). The first constituent encoder 104a outputs non-interleaved parity bits Pn and the second constituent encoder outputs interleaved parity bits Pi. The parity bits add are redundant bits added in order to improve error correction at the decoder 115 (N.B. these redundant parity bits are in addition to the check-sum bits optionally introduced by a CRC—the CRC code and turbo codes are separate entities).
The systematic data bits Sk and parity bits Pnk and Pik (where the integer k is the information bit index=1, 2, 3, etc.) are then transmitted in a sequence such as:
S1, Pn1, Pi1, S2, Pn2, Pi2, S3, Pn3, Pi3, . . . etc.
Each combination of Sm, Pm, Pm for a given bit index m forms a symbol related to the uncoded information bit at index k. So in this example, three bits are transmitted to represent every one actual information bit, giving a coding rate of R=⅓.
The output of each of the constituent encoders 104a and 104b may then be coupled to a puncturing module 108 in order to “puncture” the parity bits Pn and Pi before transmission. This means certain ones of the parity bits are removed, thus varying the ratio of redundant information to actual information, i.e. variably increasing the coding rate R. Puncturing will be discussed in more detail later.
The systematic data S and parity bits Pn and Pi are transmitted over a noisy channel 110 to a receiver 101, typically wirelessly, e.g. over a wireless cellular network. The noise may represent any disturbance due to the process of transmitting the signal over the wireless propagation channel and from the conceptual operation and implementation of the communication receiver 101—so it may be due for example to any or all of: the modulation process; other transmission (Tx) processes or components; the air interface itself; or the equalisation, demodulation or log-likelihood ratio estimation processes at the receiver 101; or other processes or components of the receiver 101. Therefore all of the data S, Pn and Ri as received at the decoder 115 are not likely to be exactly as output by the encoder 102: the received data will contain errors, and so an error-correcting decoding process such as Turbo decoding is required.
The Turbo decoding process in the receiver 101 is an iterative process whereby each copy of the received data is decoded in alternating turns by a soft-inputsoft-output (SISO) decoder. At each Turbo decoder iteration some “soft” information concerning the likely value of the original data is passed on as the “extrinsic” input to the next iteration. After a number of such Turbo iterations, the soft information is converted into a “hard” decision. In the presence of a CRC code, a checksum is then performed on the decoded data to verify that all errors were corrected successfully.
An example receiver 101 is illustrated schematically in FIG. 1. The receiver 101 comprises a front-end 116 which samples each of the received systematic data bits S and parity bits Pn and P. However, since these are to be used in a soft decoder, the front-end does not simply sample the received bits as definite logic-ones and logic-zeros. Instead, for each received bit S, Pn or Pi, it samples a measurement representing the ratio of the probability that the received bit was transmitted as a logic-one to the probability that the received bit was transmitted as a logic-zero. The calculation of the probability ratio as a function of input sampled in-phase/quadrature signal amplitudes is dependent on the modulation scheme used (e.g. 16-QAM, 64-QAM etc.) and the estimated channel noise variance. E.g. if this soft measurement was represented by an eight-bit variable then it could take any value between −127 meaning “very likely logic-zero”, to 0 meaning “equally likely logic-one or logic-zero”, to +127 meaning “very likely logic-one”. These “soft” values are referred to as likelihood ratios. In fact these are typically expressed as log likelihood ratios, so mathematically S=ln(Prob(bit=1)/Prob(bit=0). When the bit is equally likely to be one or zero, Prob(bit=1)=Prob(bit=0)=>S=ln(1)=0. The use of logs simplifies the decoder since multiplication may be replaced by addition of logs.
The receiver 101 further comprises an interleaver 114 with an input arranged to receive the received systematic data S, and a de-puncturing module 112 with an input arranged to receive the received parity data Pn and Pi. For the case of parallel concatenated turbo code, the received systematic data S is split into two parallel branches, one of which is interleaved by the interleaver 114 in a corresponding manner to the interleaver 106 which was applied at the transmitter 100. Thus a series of non-interleaved bit log likelihood ratios Sn and interleaved bit log likelihood ratios Si are determined at the receiver. The de-puncturing module 112 re-inserts the pattern of any parity bits Pn and/or Pi that were removed by the puncturing module 108 at the transmitter 100. Since the information on the parity bits is not available at the receiver, the positions corresponding to the parity bits are filled with log likelihood ratios representing “equally likely logic-one or logic-zero”.
The receiver further comprises a Turbo decoder 115, which comprises a first SISO constituent decoder 117a, a further interleaver 119, a second constituent SISO decoder 117b, a de-interleaver 123, and a hard-decision module 125. The first SISO decoder 117a has one input coupled to the front-end and thus arranged to receive the non-interleaved systematic bit log likelihood ratios Sn, another input coupled to an output of the de-puncturing module 112 and thus arranged to receive the non-interleaved parity bit log likelihood ratios Pn, and another input coupled in a iterative feedback loop to the output of the de-interleaver 123. The output of the first SISO decoder 117a is coupled to the input of the further interleaver 119. The second SISO decoder 117b has an input coupled to the output of the interleaver 114 and thus arranged to receive the interleaved systematic bit log likelihood ratios Si, another input coupled to an output of the de-puncturing module 112 and thus arranged to receive the interleaved parity bit log likelihood ratios Pi, and another input coupled to the output of the further interleaver 119. The output of the second SISO decoder 117b is coupled to the input of the de-interleaver 123. The output of the de-interleaver 123 is coupled back to the input of the first SISO decoder 117a, and also coupled on to the input of the hard-decision module 125. The output of the hard decision module 125 is arranged to supply the hard output of the decoder 115.
In operation, the first SISO decoder 117a performs a soft decoding process based on the non-interleaved systematic and parity bits Sn and Pn (and input extrinsic Ein—see below); thus outputting a soft decision Eout for each decoded bit. The soft decision Eout is typically expressed as an “extrinsic” value. Note the distinction between “extrinsic” values Eout/Ein and log likelihood ratios. The extrinsics Eout/Ein passed between the component decoders are not true log likelihood ratios. In fact for decoder 117a in the log domain the extrinsic Eout=LLRout−(Sn+Ein), where LLRout is the output log likelihood ratio from each decoder. Intuitively the extrinsic measures the contribution to each decoder's log likelihood ratio estimate from the parity information available only to that decoder. Typically only that information is passed between decoders during the iterative decoding process. The extrinsic is also referred to in the literature as “a priori” probability information for each decoder while the LLR is the complete “a posteriori” probability.
The further interleaver 119 then interleaves the data Eout output by the first SISO decoder 117a in a corresponding manner to the interleaver 106 which was applied at the transmitter 100 and other interleaver 114 applied at the receiver 101, in order to supply the input extrinsic Ein to the second SISO decoder 117b. The second SISO decoder 117b then performs a soft decoding process on the interleaved data based on the interleaved systematic and parity bits Si and Pi (and input extrinsic Ein), and thus outputs another soft-decision (output extrinsic) Eout for each decoded bit. The de-interleaver 123 then reverses the interleaving applied by the further interleaver 119, and the de-interleaved soft data output by the de-interleaver 123 is fed back as the input extrinsic Ein to the input of the first SISO decoder 117a to undergo one or more further Turbo decoder iterations, by repeating the process outlined in this paragraph.
Once a sufficient or predetermined number of Turbo iterations have been completed, the de-interleaved soft data output by the de-interleaver 123 is supplied to the input of the hard-decision making module 125, which converts the soft extrinsics into definite binary values of either logic-one or logic-zero, depending on which is finally determined to be more likely. That is, on the final iteration, the true log-likelihood ratio information from the final SISO decoding (LLRout not Eout) is passed to the de-interleaver 123 and hard-decision process 125. Hence LLRout is shown in FIG. 1 as an additional output from the second SISO decoder 117b. 
The decoding of both interleaved and non-interleaved versions of the data, and the multiple iterations, improves the reliability of the decoded data.
Turning to the details of the constituent encoders 104a and 104b, the constituent encoding algorithm can in principal be any error correcting code for which a SISO decoder can be built. However, the Turbo algorithm 102, 115 just described is so effective that in most cases, a small, short constraint length recursive, convolutional encoder is usually used. This makes SISO decoding relatively inexpensive—which is very important since the Turbo decoding algorithm can require several SISO decoder iterations.
The constituent encoder is often a recursive systematic convolutional encoder with just 8 possible states (or sometimes 16). A schematic illustration of an 8-state convolutional encoder is illustrated schematically at the top of FIG. 2. A respective instance of this could be used to implement each of the constituent encoders 104a and 104b. The convolutional encoder comprises a shift register comprising three sequential 1-bit data elements D, and a plurality of modulo-2 adders (+), with the arrowed lines in FIG. 2 showing coupling and direction of data flow. Since there are three data elements D, each constituent encoder 104a or 104b can at any one time take one of only 23=8 possible states.
The adders (+) are exclusive-OR (XOR) gates (such that 0+0=0, 0+1=1, 1+0=1, and 1+1=0). The input systematic data S is input through the left-most XOR gate in FIG. 2 and then shifted through the data elements D of the shift register, with the other input of the left-most XOR gate being the XOR of the two right-most data elements D. Thus each successive state is dependent on the input data S and the preceding state of the shift register. Since the data elements D are connected in the form of a shift register and fed in this manner from input S, then the bits in data elements D cannot transition arbitrarily from one state to another. Instead, only certain state transitions are possible (and the XOR circuitry maps a symbol S,P onto each of the state transitions). This is illustrated schematically at the bottom of FIG. 2, which is a portion of “trellis diagram” showing some allowed transitions (with the right-hand data element corresponding to the most significant bit). For example, state 000 will remain at 000 if the next input bit S is a 0, or will transition to 001 if the next input bit is 1. However, a transition from 000 to any other state, e.g. 101, is impossible. As another example, state 001 will transition to state 010 if the next input bit S is a 0, or will transition to state 011 if the next input bit S is a 1. However, a transition from 001 to any other state, e.g. 110 or 000, is impossible. Thus only certain “paths” through the trellis diagram are possible.
A “trellis” decoder uses this fact to provide error correction at the receiver. Each SISO decoder 117a and 117b comprises a respective instance of such a trellis decoder. In the trellis decoder, a “state-metric” represents the probability of each of the 8 possible states for each transition between symbols of the encoded data received. That is, for each received symbol transition, the trellis decoder 117a or 117b determines the probability that the respective encoder 104a or 104b transitioned to each of the 8 possible states after it transmitted the symbol in question (so each symbol corresponds to a set of 8 state metrics). This works on the basis that an erroneous symbol S,P (corrupted by noise) will result in a deviation from an allowed path through the trellis diagram. By analysing possible solutions, the probabilities of the 8 different possible states for each symbol can be determined.
For a maximum a posteriori probability (MAP) decoder, the decoding process operates by performing a run of trellis iterations over a sequence (e.g. a block) of received symbols and their corresponding sets of state-metrics, updating each successive state-metric in the received sequence based on: the preceding state-metrics in the sequence; the received symbol values; and, implicitly, knowledge of the encoding rule used. With each trellis iteration, the aim is for the state metrics of the respective set to diverge such that one emerges as more likely than the others (i.e. diverge within a given set). In decoders such as Turbo decoders, the whole run may be repeated again one or more times across the sequence to try to get the log likelihood ratios to diverge further to a more definite, reliable solution. That is, each Turbo decoder iteration comprises a whole sweep of trellis iterations over the received block or sequence, whereas an individual trellis iteration is an iteration between two adjacent sets of state-metrics. To distinguish between the overall Turbo iterations and their individual component trellis iterations or such like, an individual iteration between sets of state metrics such as a trellis iteration may be referred to as a “recursion”. For a mathematical description of this process, see the above References 1-7.
A MAP trellis decoder can also be made to accept the received symbols in the reverse order to which they were originally encoded since the previous and next states of the constituent encoder can be derived easily from one another. By running the trellis decoder in both directions the resulting “forward” and “backward” state-metrics can be combined to create a better soft-likelihood estimate of the original symbol values.
On a point of terminology, note that to simplify the arithmetic operations in practical implementations of such a MAP decoder, the state-metrics are typically represented as the logarithms of probabilities while the received symbol values are typically represented as the logarithm of a ratio of the probability of a logic-one to the probability of a logic-zero (a.k.a. log-likelihood-ratio, or LLR).LLR(Sk)=log [P(Sk=1|rk)/P(Sk=0|rk)],
where P(Sk=1|rk) and P(Sk=0|rk) are the probability that Sk corresponds to the logical value 1 given the received signal rk, and the probability that Sk corresponds to the logical value 0 given the received signal rk, respectively. Soft extrinsics such as Eout may also be represented in a logarithmic form, as discussed above. Hence the name “Log-MAP Decoder” for the type of SISO decoder discussed herein. Further arithmetic simplification can be obtained at the expense of some accuracy to yield the Max-Log-MAP Decoder.
Implementing the MAP SISO decoder requires a relatively large amount of memory since, in order to perform a calculation using both the forward and backward state-metrics for every symbol in the block of encoded data it is first necessary to calculate and store all of the backward state-metrics or all of the forward state-metrics. The remaining set of state-metrics can then be calculated and used immediately by combining them with the stored set.
FIG. 3 schematically illustrates a block of forward and backward trellis recursions (i.e. forward and backward trellis iterations), for an 8-state convolutional code. Each vertical line of dots in FIG. 3 represents a set of the 8 possible states, each set corresponding to a respective symbol in the received sequence. Each Turbo iteration then comprises a forward and reverse run of trellis recursions over the block (for each of the interleaved and non-interleaved versions of the data).
The memory must contain 8 state-metrics for every symbol in the block. If a larger “general purpose” memory is used, then it should be noted that 8 state-metrics must be loaded and stored for every symbol in the block for each MAP iteration.
To avoid this cost, Viterbi invented the windowed MAP decoder (see References 4 and 6 above). Viterbi made the significant observation that it is not necessary to start a trellis decoder at the very beginning or end of a block of symbols in order to obtain a similarly accurate set of state-metrics for a particular symbol somewhere in the middle of the block. Instead—it is possible to start “cold” at any point sufficiently distant from the point of interest with some arbitrary set of state-metrics. If the distance is sufficiently large, then the initial set of state-metric values is irrelevant by the time the trellis decoder arrives at the point of interest. Viterbi suggests that a distance of 32 for a 16-state constituent code is often more than sufficient, observing that this amounts to more than 6 constraint lengths.
The idea of windowing, with a window of length L, uses this observation in order to avoid storing more than one window length's worth of state-metrics, i.e., more than L sets of state metrics corresponding to the L trellis stages of the window. Instead, in order to calculate L sets of metrics it is necessary only to “warm-up” over a sufficient distance (without storing any state-metrics) prior to calculating and using the following L sets of state-metrics. This warm-up phase requires additional calculations (trellis iterations, or “recursions”) and thus, in order to amortize the extra cost of these calculations, the duration of the warm-up phase should not greatly exceed the window size (for example, using a window size of 1 would remove the requirement for any memory but would require an inordinate number of warm-up recursions).
FIG. 4 provides a schematic illustration of a windowed Log-MAP decoding process for use in a trellis decoder such as 117a or 117b. Time increases vertically down the page. At step 1a, the decoder performs a run of L trellis recursions (i.e. trellis iterations) in the forwards direction for the first window beginning at index 0 within the block and running to index L-1, storing all of the corresponding L sets of 8 state-metrics (i.e. 8 by L state-metrics). At step 1b, the decoder performs W warm-up trellis recursions in the reverse direction, from index L+W-1 within the block backwards to index L. This provides a warmed-up starting set of state-metrics for starting reverse trellis recursions for the first window backwards from index L-1 to index 0. At step 2a, the decoder performs a run of L trellis recursions in the forwards direction for the second window continuing onwards from the recursions of first window, starting at index L and running to index 2L-1, and storing the corresponding next L sets of 8 state-metrics. At step 2b, the decoder performs W warm-up trellis recursions in the reverse direction, from index 2L+W-1 within the block to index 2L. This provides a warmed-up starting set of state-metrics for starting reverse trellis recursions for the second window backwards from index 2L-1 to index 2L. At step 2c, the decoder performs a run of L trellis recursions in the reverse direction for the first window, starting backwards from index L-1 running to index 0, beginning from the warmed up set of state-metrics as determined for the first window in step 1b, and storing the corresponding preceding L sets of 8 state-metrics. As a result of step 2c, a soft extrinsic value Eout is output for each symbol in the first window, ready for a subsequent Turbo iterations or output to the hard decision module 125 if sufficient Turbo iterations have already been performed. The process goes on in this manner for the second, third, fourth window, etc. until the whole block has been covered. Each Turbo iteration comprises a whole pass or sweep of component trellis iterations—i.e. trellis recursions—over the block.
As long as the required warm-up phase duration is modest; the window size L can be made similarly small and independent of the block size; thus the memory for temporary state-metric storage can be small and built very locally to the trellis computation hardware—minimizing the infrastructure needed and the power consumed to access it.
It is advantageous to be able to vary the coding rate in order to adapt to different channel conditions. The coding rate ‘R’ is defined as the ratio between the number of input (uncoded) bits and the number of encoded bits. Typically, a turbo encoder will generate 1 or 2 parity bits from each of the two constituent encoders for each input systematic bit. So, for example the 3GPP standard turbo code has a rate of R=⅓ (with 1 parity-bit per constituent encoder per input bit).
While such a code may be necessary to provide a sufficient level of error correction under low signal to noise ratio (SNR) conditions, it is excessive and wasteful when channel conditions are better. In order to raise the code rate R under such conditions, then a technique called puncturing is used whereby a defined set of parity bits are removed (i.e. not transmitted). The MAP decoders in the receiver 101 cope with the missing parity bits by assigning them a log-likelihood-ratio of 0 (meaning “equally likely to be either a logic-one or a logic-zero”). For example see Reference 3.
In the 3GPP standard, this technique can be taken to extremes—with rates as high as R=0.97 (˜64 out of 65 parity bits removed!)—and with turbo decoding still giving a significant advantage over not encoding the data at all.
Unfortunately, the windowing algorithm can perform very poorly under such conditions since Viterbi's thumb-rule of requiring several constraint lengths to warm-up assumes implicitly that no or limited puncturing has taken place.
It can be shown that for any trellis recursion where the parity bit LLR is equal to zero (punctured), then the state-metrics within a set cannot diverge from one another in value (and therefore cannot converge towards a solution). At best, they are merely re-ordered according to the sign of the systematic bit, even if the systematic bit LLR value is very large. At worst, a low received systematic bit LLR value can reduce the existing divergence.
A set of warm-up recurions normally begins with all state-metrics set to the same value (all states equally likely) in the hope that after some modest number of recursions, the state-metrics will have converged to the same values that they would have taken had iterations started from a known initial state at one end of the block ((this will generally mean that the state metrics as a group diverge from the initial common value, e.g. with sufficient parity bits and in the absence of errors one state metric should emerge as much larger than the others).
This can never happen if the parity bits are so heavily punctured that there are no un-punctured parity bits throughout the entire duration of a warm-up phase (the state-metrics will remain in their equal, initialized state throughout). This means that although the windowed MAP decoder has the great advantage of not requiring a large amount of temporary storage, it can perform poorly compared with an un-windowed decoder when puncturing is used to raise the code rate.