Turbo-codes were introduced in 1993 and are part of the current communication standards due to their outstanding forward error correction performance. Turbo-codes include concatenated component codes that work on the same block of information bits, separated by interleavers. The component codes are decoded individually. Key to the performance of turbo-codes is the iterative exchange of information between the component decoders.
The information exchanged iteratively represents the likelihood of the received bit k to have been sent either as dk=0 or dk=1. The decision is represented by the sign of the Log−
  Likelihood  ⁢      -    ⁢  Ratio  ⁢          ⁢            (                        LLR          ⁡                      (                          d              k                        )                          =                  log          ⁢                                          ⁢                                    P              ⁡                              (                                                      d                    k                                    =                  1                                )                                                    P              ⁡                              (                                                      d                    k                                    =                  0                                )                                                        )        .  The confidence in this decision is based upon its magnitude. From now on the information exchanged will simply be referred to as LLR.
Interleaving is scrambling the processing order to break up neighborhood-relations, which is essential for the performance of turbo-codes. The LLR produced at position k, denoted as LLR(dk), is written to position interleaved(k) in the RAM:LLRprod(dk)→LLRRAM(dinterleaved(k)).
The interleaver and de-interleaver tables contain one-to-one mappings of source addresses to target addresses. TABLE 1 shows an example for reordering six LLRs to perform interleaving.
TABLE 1AddressInterleavedAddressDeinterleaved131626243531424554536162
De-interleaving brings them into the original sequence again. A 3GPP compliant table, for example, would contain up to 5114 entries.
One LLR has to be read for every LLR produced. If only one LLR is produced per time-step, interleaving can be performed on the fly through indirect addressing. However, high-throughput applications require parallel architectures that produce more than one LLR per time-step. Thus, multiple LLRs have to be read and written concurrently. The number of LLRs read and written may be denoted as N.
In 0.20 μm technology, a single producer can achieve a maximum throughput of about 7 Mbit/s assuming 10 iterations. For 100 Mbit/s, a reasonable assumption for future communication systems is that N=16 producers would be necessary and would require 16-port RAMs. However, the use of N-port RAMs to solve access conflicts is, in general, not practicable.
Read access conflicts can be avoided by using N individual memories. Write access conflicts cannot be avoided that easily. The positions where the produced LLRs have to be stored depend on the interleaver. For arbitrary interleavers, the target memory, that is, the RAM each LLR has to go to, is not known at design time. At each time-step and for each RAM, the number of LLRs to be stored can vary from 0 to N. The resulting concurrent accesses to the same single port memory are the real bottleneck in high throughput turbo-decoding.
The problem is best illustrated by taking the interleaver table of TABLE 1 for two concurrently produced LLRs and assigning its addresses to two individual RAMs. TABLE 2 shows the interleaver table entries together with the associated RAMs and relative addresses. From now on, only the interleaver is mentioned. Of course, the same concepts apply to the deinterleaver as well.
TABLE 2sourcerelativetargetrelativeRAMAddressAddressInterleavedRAMAddress 111313122623133522 214212225421236111
The number of write accesses can be determined from the interleaver tables and the producing scheme. Assuming that the two LLRs are produced in order of ascending relative addresses (i.e., in the first time-step at the absolute addresses 1 and 4) and interleaving is performed according to TABLE 2, TABLE 3 shows the resulting write accesses.
TABLE 3Write AccessesWrite AccessesTime-stepto RAM 1to RAM 2120202311
In the first time-step, for example, one LLR is read from source RAM1 (Addr. 1) and written to target RAM1 (Addr. 3). The other one is read concurrently from source RAM2 (Addr. 1) and written to target RAM1 (Addr. 2), which results in two concurrent write accesses for target RAM1.
In Giuletti et al., “Parallel Turbo Coding Interleavers: Avoiding Collisions In Accesses To Storage Elements,” IEEE Electronics Letters, Vol. 38, No. 5, February 2002, a dedicated interleaving scheme for each given architecture and block-size is derived to circumvent the potential access problems. However, this does not allow for pre-defined interleavers, as for example in a standard like 3GPP, nor for arbitrary block-lengths or degrees of parallelization.
In Thul et al., “Enabling High Speed Turbo Decoding Through Concurrent Interleaving” in ISCAS'02, May 2002, Vol. 1, pp. 897-900, a tree-based concurrent interleaving architecture, named TIBB (Tree Interleaver Bottleneck Breaker) is disclosed. The producers implement a Maximum-A-Posteriori (MAP) algorithm. All N MAP producers are connected to N-input buffers via the LLR distributor block. The drawback of this approach lies in the high-connectivity of the LLR distributor as well as the need for N-input buffers, whose combinatorial logic complexity increases exponentially with increasing values of N. Despite a two-level buffer implementation that was proposed to reduce this complexity, the area overhead of the TIBB approach grows exponentially with respect to the number of producers.
Another approach based on local buffer cells interconnected via a ring network (called RIBB) was proposed in Thul et al., “Optimized Concurrent Interleaving For High Speed Turbo Decoding”, in ICECS'02, Croatia, September 2002, pp. 1099-1102. Each cell contains its own LLR distributor and either writes LLRs to the local RAM or routes them to its neighboring cells of the destination that is not the local RAM. For single-ring or double-ring architectures, this leads to local cells having a reasonable complexity and efficient architectures for up to 8 producers. Above 8 producers, the size and the number of local buffers become prohibitive and the efficiency of the network requires more than two rings to avoid high latency penalties.
In Thul et al., “Concurrent Interleaving Architectures For High-Throughput Channel Coding” in ICASSP'03, Vol. 2, pp. 613-616, April 2003, is an extension of “Optimized Concurrent Interleaving For High Speed Turbo Decoding”. Thul et al. proposes a heuristic approach to generate a random graph routing network that fulfills the requirements of the interleaving for a large number of producers. However the underlying implementation complexity is still high.