Turbo codes provide forward error correction for many types of communication systems such as wireless applications, satellites, and hard disk drives. Turbo decoding achieves an error performance close to the Shannon limit. The performance is achieved through decoding on multiple iterations. Each iteration results in additional performance and additional computational delay. Making the turbo decoder as small and as simple as possible is very important for VLSI implementations.
Turbo encoding is accomplished by means of concatenation of convolutional codes. FIG. 1A illustrates an example of a prior art rate 1/3 parallel-concatenated turbo encoder. The notation rate 1/3 refers to the configuration of FIG. 1A in which a single input bit stream xk is converted by the encoder into a 3-component bit stream. Input data stream 100 passes unmodified to multiplexer input 106. Two recursive systematic convolutional (RSC) encoders 102 and 103 function in parallel to transform their input bit streams. The resulting bit streams after transformation by RSC encoder 102 forms multiplexer input 107 and after transformation by RSC encoder 103 forms multiplexer input 108. Block 101 is an interleaver (I) which randomly re-arranges the information bits to decorrelate the noise for the decoder. RSC encoder 102 generates a p1k bit stream and RSC encoder 103 generates a p2k bit stream. Under control of a turbo controller function multiplexer 104 reassembles the separate bit streams xk 106, p1k 107 and p2k 108 into the resulting output bit stream xk/p1k/p2k 111.
FIG. 1B illustrates an example of the RSC encoder function which is a part of the blocks 102 or 103. Input data stream 120 passes unmodified to become output x0 131. After transformation by the RSC encoder the resulting bit streams 131, 132 and 133 in prescribed combinations form multiplexer inputs 107 and 108 of FIG. 1A. The precise combinations are determined by the class of turbo encoder being implemented, 1/2, 1/3, or 1/4 for example. The action of the circuit of FIG. 1B is depicted by a corresponding trellis diagram which is illustrated in FIG. 4 and will be described in the text below.
This transmitted output bit stream 111 of FIG. 1A can be corrupted by transmission through a noisy environment. The function of the decoder at the receiving end is to reconstruct the original bit stream by tracing through multiple passes or iterations through the turbo trellis function.
FIG. 2 illustrates the functional block diagram of a prior art turbo decoder. A single pass through the loop of FIG. 2 is one iteration through the turbo decoder. This iterative decoder generates soft decisions from two maximum-a-posteriori (MAP) blocks 202 and 203. In each iteration MAP block 202 generates extrinsic information W0,k 206 and MAP block 203 generates extrinsic information W1,k 207. First MAP block 202 receives the non-interleaved data xk 200 and data p1k 201 as inputs. Second MAP decoder 203 receives data p2k 211 and interleaved xk data 210 from the interleaver block 208.
FIG. 3 illustrates the functional block diagram of a prior art MAP block. The MAP block of FIG. 3 includes circuit functions similar to those illustrated in FIG. 2. The MAP block calculates three vectors: beta state metrics, alpha state metrics and extrinsics. Both alpha block 302 and beta block 303 calculate state metrics. It is useful to define the function gamma as:Γk=f(Xk,Pk,Wk)  [1]where: Xk is the systematic data; Pk is the parity data; and Wk is the extrinsics data.
Input 300 to the alpha state metrics block 302 and input 301 to beta state metrics block 302 are referred to as a-priori inputs. The beta state metrics are generated by beta state metrics block 303. These beta metrics are generated in reverse order and stored in the beta state random access memory (RAM) 304. Next, alpha state metrics are generated by alpha state metrics block 302. The alpha state metrics are not stored because the extrinsic block 305 uses this data as soon as it is generated. The beta state metrics are read from beta RAM 304 in a forward order at the same time as the alpha state metrics are generated. Extrinsic block 305 uses both the alpha and beta state metrics in a forward order to generate the extrinsics Wn,j 306.
The variables for the MAP algorithm are usually represented by the natural logarithm of probabilities. This allows for simplification of very large scale integration (VLSI) implementation. The recursive equations for the alpha and beta state metrics are as follows:                               A                      k            ,            s                          =                  ln          [                                    ∑              S                        ⁢                                                  ⁢                          exp              ⁢                              {                                                      A                                          k                      -                      1                                                        +                                      Γ                    k                                                  }                                              ]                                    [        2        ]                                          B                      k            ,            s                          =                  ln          [                                    ∑              S                        ⁢                                                  ⁢                          exp              ⁢                              {                                                      B                                          k                      -                      1                                                        +                                      Γ                    k                                                  }                                              ]                                    [        3        ]            where: s is the set of states in the trellis; and Γk is as stated in equation [1] above.
To more clearly understand the operation of the decoder it is helpful to review the operations of the encoder. The data input to the encoder is in the form of blocks of ‘n’ information bits (n=5114=frame size) and the encoding proceeds from the zero state of the trellis. After n cycles through the trellis the encoder ended at some unknown state.
In an encoder without sliding windows, the frame size of the block contains n×s×d bits. For a frame size n of 5114, a number of trellis states s of 8 and a number of bits of data precision d equal to 8 bits, then n×s×d=327,296 bits and N cycles through the trellis. With sliding windows, the processing of each window involves r+p cycles and r×s×d bits, where: r is the size of the reliability portion of the sliding window; and p is the prolog size. This requires r iterations through the trellis. Consider the example where r=128. Then for the sliding windows case, processing involves r×s×d=8192 bits and r+p=j cycles where: j=n/r. Clearly, the decoder memory size requirements are greatly reduced through the use of sliding windows at a cost of more cycles.
During encoding a number of tail bits t are appended to the encoder data stream to force the encoder back to the zero state. For a constraint length k code, where t=k−1, there are systematic tail bits for each RSC encoder. Consider the example of an eight state code where k=4 and t=3. The alpha state metric block will process the received data from 0 to n+2 and the beta state metric block will process the data from to n+2 to 0.
In FIG. 3, both the alpha state 302 and beta state 303 metric blocks calculate state metrics. Both start at a known location in the trellis, the zero state. The encoder starts the block of n information bits, where n is the frame size of 5114, at the zero state. After n cycles through the trellis, the encoder ends at some unknown state.
The beta state metrics are generated first by block 303. These beta metrics are generated in reverse order and stored in beta state metric RAM 304. Next, the alpha state metrics are generated by block 302. The alpha state metrics are not stored because the extrinsic block uses this data as soon as it is generated. The beta state metrics are read from the memory in a forward order at the same time as the alpha state metrics are generated. The extrinsic block 305 uses both the alpha and beta state metrics in a forward order to generate the extrinsics Wn,j 306.
FIG. 4 illustrate a trellis diagram for an 8-state state encoder depicting the possible state transitions from each possible state Sk,x=ABC. For example, for state Sk,4, ABC=001. These states are represented in FIG. 1B by the state of the three registers A 121, B 122 and C 123, respectively. In the decoder, the generation of the alpha state metrics requires processing the data in a forward direction through this trellis and the generation of the beta state metrics requires processing the data in a reverse direction through this trellis. Initial states in the trellis for forward traversal are labeled Sk,x and next states are labeled Sk+1,x. Conversely, initial states in the trellis for reverse direction traversal are labeled Sk+1,x and next states are labeled Sk,x. The nomenclature X/DEF of 403 and 404 of FIG. 4 refers to the next bit ‘Y’ inserted at the input Xk, 120 of FIG. 1B, followed by the forward slash, followed by the next three bits D, E and F generated respectively at the nodes 131, 132, 133 of FIG. 1B.
Turbo decoder processing is an iterative process requiring multiple cycles until a low bit-error ratio (BER) solution is obtained. Because the state of the trellis at the start of processing is unknown the probability of the occurrence of all the states in the trellis is initialized to a uniform constant. For each pass through the trellis, the probability of occurrence of a given state will increase or decrease as convergence to the original transmitted data proceeds. After processing through the trellis a number of times a set of states corresponding to the original transmitted data becomes dominant and the state metrics become reliable.
FIG. 5 illustrates a diagram in which the block of size n is broken into several smaller pieces. Each piece is called a sliding window (sw) and is composed of two parts. These two parts are the reliability (r) section 501 and the prolog (p) section 502. Normally the encoded block starts the trellis in the 0 state and ends in the 0 state. S is the number of states where S=2v and v is the number of encoder memory registers. The tail bits, are labeled t 503 in FIG. 5 where v=t tail bits are appended to the encoded block to force this condition.
Decoder processing is an iterative process requiring multiple cycles until a low bit-error ratio solution is obtained. The sliding windows in general start at some random unknown state. The exception is the sliding window that ends with the tail bits. Due to the fact that the initial state is unknown it is necessary to add additional computations through the trellis to achieve good starting results. Because the state of the trellis at the start of the prolog is unknown except that the last beta sliding window is terminated with the tail bits, the probability of the occurrence of all the states in the trellis is initialized to a uniform constant. For each pass through the trellis, the probability of occurrence of a given state will increase or decrease during convergence to the original transmitted data. After processing through the trellis a number of times equal to p, the prolog size, a set of states corresponding to the original transmitted data becomes dominant and the state metrics become reliable. Recommended sizes for p are 4 to 6 times the constraint length of the trellis.
For example, if n=4096, r=128, S=16 and the code rate is 1/3, then there are 32 sliding windows and p=30. For punctured codes such as code rate 1/2, the prolog must grow for equivalent performance. In this example, the prolog would grow from 30 to 48. This solution reduces the memory from 64 k bytes to 2 k bytes at a cost of increasing the number of trellis stages from 4096 to (128+48)×31+(128+4)=5588.
Prolog reduction techniques are directed to reducing the number of passes required through the trellis function to achieve an acceptable bit error ratio (BER). For voice data an acceptable BER might be 1000:1, but for data transmission an acceptable BER is more likely in the range of 1,000,000:1. The crux of the problem is how does the optimum system initialize the states as it proceeds through the successive states in the trellis.
Some proposed initialization guidelines are:
1. Setting all zeros as the starting state. This requires no memory and operates with prolog p=48.
2. Saving all states αS and βS setting the prolog between the limits p=0 through p=48. This requires a very large memory.
FIG. 6 illustrates initialization of all states of the proceeding beta sliding window. For the first iteration of swA 601, the states are initialized with a uniform distribution. Sliding window swA 601 is processed. For the first iteration of swB 602, the states are initialized with a uniform distribution and swB 602 is processed. The final value of the states of swB 602 are stored into memory. This procedure is repeated for the remaining sliding windows. During the second iteration, swA 601 is initialized with the stored values of swB 602, and swA 601 is processed. During the second iteration, swB 602 is initialized with the stored values of swC 603 and swB 602 is processed. This sequence of initialization continues for each iteration.
Static state initialization works and gives good results but it does have difficulties. Because the first iteration is initialized with a uniform distribution and there is no prolog, the first iteration results are sub-optimum. It will take more iterations using this technique to achieve the same bit error rate as with a prolog section. Another difficulty is convergence. If the channel noise is too high, then the sliding window initializations might be incorrectly set leading to non-convergence.
Saving all states requires a large amount of memory. For example if S=256, n=4096, w=128, and a fixed-point size of 8 bits, then (4096÷128)×256×8×2MAPs×2=262K bits must be saved.
Saving the states with optimum probability, both alpha and beta, h=arg(maxΣBk,s), and storing value h lowers the memory size required. For S=256, it takes 8 bits to store h. Using this static value for initialization is attractive for VLSI implementation due to the smaller memory requirements.
For the above example only (4096÷128)×8×2MAPs×2=1024 bits are required. The initialization of the state metrics is done by setting the starting state metric to the highest value and the other states to a lower value. For example, if h=3, then s[3]=16, s[0]=s[1]=s[2]=s[4]=s[5]= . . . =s[255]=0.
These first two approaches use either a full prolog section or no prolog section. Using a full prolog section requires the most computational overhead per MAP decode, but it gives the best bit error rate (BER). Using no prolog section requires the fewest number of operations per MAP; but it gives the worst bit error rate.