Data signals, in particular those transmitted over a typically hostile RF interface, are susceptible to errors caused by interference. Various methods of error correction coding have been developed in order to minimize the adverse effects that a hostile interface has on the integrity of communicated data. This is also referred to as lowering the Bit Error Rate (BER), which is generally defined as the ratio of incorrectly received information bits to the total number of received information bits. Error correction coding generally involves representing digital data in ways designed to be robust with respect to bit errors. Error correction coding enables a communication system to recover original data from a signal that has been corrupted. Typically, the greater the expected BER of a particular communication link, the greater the complexity of the error correction coding necessary to recover the original data. In general, the greater the complexity of the error correction coding, the greater the inefficiency of the data communication. The greater inefficiency results from a reduction of the ratio of information bits to total bits communicated as the complexity of the error correction coding increases. The additional information introduced into the original body of data by error correction coding consumes spectrum bandwidth and processor cycles on both the transmitting and receiving ends of the communication.
In cases where the expected BER of a particular communication link is substantially higher than the acceptable BER, a concatenated set of error correcting codes may be applied to the data in order to lower the BER to acceptable levels. Concatenated error correction coding refers to sequences of coding in which at least two encoding steps are performed on a data stream. Concatenated coding may be performed in series, where encoded data is subjected to further encoding, or in parallel where the original data is subjected to different encoding schemes to perform intermediate codes which are then further processed and combined into a serial stream.
Parallel and serial concatenated codes are sometimes decoded using iterative decoding algorithms. One commonly employed method of iterative decoding utilizes a single decoder processor where the decoder output metrics are fed back to the input of the decoder processor. Decoding is performed in an iterative fashion until the desired number of iterations have been performed. In order for the decoder processor to decode the encoded input data at the same rate as the input data is arriving, the decoder processor must process the encoded data at a rate faster than the rate of the incoming data by a factor at least equal to the number of iterations necessary. With this method of iterative decoding, the speed of the decoder processor becomes a significantly limiting factor in the system design.
Turbo codes are examples of parallel concatenated coding and are used as a technique of error correction in practical digital communications. The essence of the decoding technique of turbo codes is to produce soft decision outputs, i.e. different numerical values which describe the different reliability levels of the decoded symbols, which can be fed back to the start of the decoding process to improve the reliabilities of the symbols. This is known as the iterative decoding technique. Turbo decoding has been shown to perform close to the theoretical limit (Shannon limit) of error correction performance after 18 iterations—C. Beerou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding: Turbo Codes.” In Proc. IEEE Int. Conf. Commun., Geneva, Switzerland, 1993, pp. 1064–1070. The Turbo Decoding algorithm is a very complex task as it takes up a large amount of computation time and consumes a lot of memory resources.
A turbo encoder is shown in FIG. 1 and comprises a pair of parallel-concatenated convolutional encoders (12, 13) separated by an interleaver (11), where the interleaver plays a role to shuffle (interleave) its input sequence in a pre-determined order. It accepts an input binary {0,1} sequence of a specified code block of size N symbols, and produces three types of encoded output for each symbol when the coding rate is ⅓.
Referring to FIG. 2, a turbo decoder receives the encoded signals and uses all three types of signals when the coding rate is ⅓ to reproduce the original bit sequence of the turbo encoder input. Two MAP decoders 21 and 24, associated with the convolutional encoders 12 and 13 respectively, perform the decoding calculations. In addition to an interleaver 22 to mirror the interleaver 11 of the encoding side, the turbo decoder also consists of a deinterleaver 23 to reconstruct the correct arrangement of the bit sequence to be fed back from 24 to 21. The decoded bits after the final iteration are hard decisions, i.e. output binary sequence {0,1}, obtained from 24.
A MAP decoder uses the BCJR algorithm—see L. R. BAHL et al., “OPTIMAL DECODING OF LINEAR CODES FOR MINIMIZING SYMBOL ERROR RATE”, IEEE Transactions on Information Theory, March 1974, pages 284–287—to compute the soft outputs, or likelihood. Using the received signals x, y and z, the algorithm computes three types of probabilities: α β λ.
In a sense, α represents the likelihood probability of a symbol changing from a state m′ to another state m as the time interval progresses from t to t+1. The β probability, on the other hand, corresponds the likelihood probability of a symbol changing from a state m to m′ from the time interval t to t-1. α and β are also known as forward and backward probabilities. The initial values for α and β are known because the states at the start and the end of the block L are set to zero in the turbo encoder. The λ probability fuses α and β together to obtain one measure of likelihood for each symbol. Then, the λ will be used to compute the output of the turbo decoder which will be either the soft decisions (feedback) or the hard decisions ({0,1} bits).
These three probabilities must be computed sequentially, and normalized for each symbol. The computation sequences of α, β, and λ is briefly shown in FIG. 3. The state transitions in Beta and Alpha computation are shown in FIG. 4. The eight states correspond to the eight states of the constituent encoders in Turbo Encoder with ⅓ coding rate. There are two eight state constituent encoders, 12 and 13, in Turbo Encoder which follow the polynomials: g0(D)=1+D2+D3 and g1(D)=1+D+D3 respectively. The Next[m][0] and Next[m][1] refer to next state as “0” and “1”. The prev[m][0] and prev[m][1] refer to previous state as “0” and “1”. The branch matrix computation of α and β will follow the state transitions in FIG. 4.
As can be seen in the reference [Bahl et al], the α and β are independent of each other, but λ is dependent on α and β. A complete algorithm requires that the α and β probabilities for all symbols L to be used to calculate λ.
According to the general SubLogMAP Algorithm of Turbo Decoder, the procedure of Turbo decode is (a) to calculate β values, (b) to calculate α values and (c) to calculate λ finally as shown in FIG. 3. FIG. 5 shows the overall structure for the calculations of β, where the β computation and normalization are performed whose results are fed back to the β computation paths as well as written into the β memory which will be read out on the next phase (α and λ computation). The details and critical paths of β computation and normalization paths are shown in FIG. 6. FIG. 10 shows the LSI architecture for the calculations of α. The details and critical paths of α computation are shown in FIG. 12. There is a key operation, the normalization for β and α calculation to reduce the hardware amount. Normalization is needed to avoid the overflow of β and α computation. As shown in FIGS. 6 and 12, normalization process consists of (1) maximum values selection from the eight state values and (2) subtract of this maximum values from these eight state values, which will be fed back to the next computation. Because we are only interested in the maximum values of likelihood capabilities, this normalization is very effective to reduce the bit widths of the β and α computation data paths and various kinds of memories. Because the total calculation (β and α value computation and the normalization) is composed of fairly long critical paths as shown in FIG. 6 for β calculation and FIG. 12 for α calculation, the slower computation speed will be obtained. FIG. 11 shows the overall structure for the calculations of λ. The details and critical paths of λ computation paths are also shown in FIG. 12 (Shaded operation in the figure). As can be seen in FIG. 12, a lot of adders can be shared between the α and λ computation. There is also a long critical path from α value computation to LLR (Log-likelihood Ratio) computation for the calculations of λ.
The timing sequence of β computation is shown in the upper half portion (Beta Computation Stages Before Pipelining) of FIG. 9 as an example. As can be seen in this FIG. 9, each β computation and normalization is sequentially performed in one clock cycle, and the normalized β feedback is performed at the beginning of the next clock cycle. Similarly, each α computation and its normalization or λ computation and its LLR selection are sequentially performed in one clock cycle period. As shown in FIG. 6 and FIG. 12, the long critical path limits the speed of operating clock frequency. The calculation speed limitation of this architecture is caused by this longer clock cycle period time, in which the values of α and β are calculated.
As mentioned previously, the speed issue of turbo decoder implementation is the limitation of clock speed. The clock speed is decided by its cycle period in which the values of α and β are calculated. This is the critical path of the LSI implementation (or called maximum delay time of the calculation). There are two main operations (1) value calculation for α, β, and λ and (2) normalization for α and β, and LLR selection for λ in one clock cycle period to increase the critical path as shown in FIG. 6,
FIG. 9 and FIG. 12. This kind of sequential operations slow down the clock speed. Recently, as turbo coding is applied to real systems for higher data rate transmission, we need to employ faster turbo decoding while keeping the buffer storage requirement as small as possible.