Turbo codes, which were introduced in 1993, are used in today's communication standards due to their outstanding forward error correction performance. They include concatenated component codes that work on the same block of information bits, which are separated by interleavers. The component codes are decoded individually.
One key to the performance of turbo codes is the iterative exchange of information between the component decoders. The information exchanged iteratively represents the likelihood of the received bit k to have been sent either as dk=0 or dk=1. The decision is represented by the sign of the log-likehood-ratio   (                    LLR        ⁡                  (                      d            k                    )                    =              log        ⁢                              P            ⁡                          (                                                d                  k                                -                1                            )                                            P            ⁡                          (                                                d                  k                                -                0                            )                                            ,  and the confidence in this decision is represented by its magnitude.
Interleaving involves scrambling the processing order to break up neighborhood relations, and it is important for the performance of turbo codes. The LLR produced at position k, denoted as LLR(dk), is written to position interleaved(k) in the RAM:LLRprod(dk) LLRRAM(dinterleaved(k)) 
The interleaver and deinterleaver tables include one-to-one mappings of source addresses to target addresses. Table 1 shows an example for reordering six LLRs to perform interleaving.
TABLE 1Interleaver/Deinterleaver Table for six LLRs.  AddressInterleaved   Address Deinterleaved      1       3  1        626243531424554536162
Deinterleaving brings these items into the original sequence again (a 3GPP compliant table, for example, would include up to 5114 entries). One LLR has to be read for every LLR produced. If only one LLR is produced per time step, interleaving may be performed on the fly through indirect addressing. However, high-throughput applications require parallel architectures that produce more than one LLR per time step. Thus, multiple LLRs have to be read and written concurrently. The number of LLRs read and written will be denoted herein as N.
In a 0.20 μm technology, a single producer may achieve a maximum throughput of about 7 Mbit/s assuming 10 iterations. For 100 M bit/s, a reasonable assumption for future communication systems, N=16 producers would be necessary, requiring 16-port RAMs. Yet, the use of N-port RAMs to solve access conflicts is, in general, not feasible.
Read access conflicts may be avoided by using N individual memories. Write access conflicts may not be avoided that easily. That is, the positions where the produced LLRs have to be stored depend on the interleaver. For arbitrary interleavers, the target memory, i.e., the RAM to which each LLR has to go to, is not known at design time. At each time step and for each RAM, the number of LLRs to be stored may vary from 0 to N. The resulting concurrent accesses to the same single port memory are thus a significant bottleneck in high throughput turbo decoding.
The problem is perhaps best illustrated by taking the interleaver table of Table 1 for two concurrently produced LLRs and assigning its addresses to two individual RAMs. Table 2 shows the interleaver table entries together with the associated RAMs and relative addresses. It should be noted that only the interleaver will be mentioned hereafter, but the same concepts apply to the deinterleaver as well.
TABLE 2Interleaver Table with associated RAMs.Source Relative TargetRelativeRAMAddressAddressInterleavedRAMAddress=> 1   113 1    3122623133522=> 2  14212225421236111
The number of write accesses may be determined from the interleaver tables and the producing scheme. Assuming that the two LLRs are produced in order of ascending relative addresses (i.e. in the first time step at the absolute addresses 1 and 4) and interleaving is performed according to Table 2, and Table 3 shows the resulting write accesses.
TABLE 3Write Accesses to LLR RAMS    Write Accesses to Write Accesses toTime stepRAM 1RAM 2    1              2              0202311
In the first time step, for example, one LLR is read from source RAM1 (Address 1) and written to target RAM1 (Address 3). The other one is read concurrently from source RAM2 (Address 1) and written to target RAM1 (Address 2), which results in two concurrent write accesses for target RAM1.
In A. Giuletti, L. Van Der Perre, M. Strum., Parallel turbo coding interleavers: avoiding collisions in accesses to storage elements, IEEE Electronics Letters Vol. 38, No. 5, February 2002, a dedicated interleaving scheme for each given architecture and block size is derived, circumventing the arising access problems. This approach does not, however, allow for pre-defined interleavers, as for example in a standard like 3GPP, nor for arbitrary block lengths or degrees of parallelization. Other prior art approaches use one fixed interleaver implemented through wired connections between component decoders.