In typical turbo decoders, a parallel decoding process may be used in which a non-permuted code block containing non-permuted symbols and a permuted code block containing permuted symbols may be divided into multiple sub-blocks, stored in different memory banks, and decoded in parallel using multiple processors. The non-permuted code block may be decoded in this manner because the non-permuted symbols are stored in the memory banks in a linear order.
However, problems may arise when decoding the permuted code block because the permuted symbols are generated on-the-fly, such that the corresponding non-permuted symbols are read from the memory banks and arranged to the permuted positions as required. For example, a collision may arise when two or more symbols must be read from the same memory bank at the same time, resulting in delay and reducing throughput. The number of collisions, and the resulting reduction in throughput, is typically high enough that the throughput requirements of the turbo decoder cannot be met. Typical solutions include using specially designed hardware (e.g., contention-free interleavers that must be part of the standard) or a greater amount of memory banks and processors, both of which have the undesirable effects of increasing the complexity and cost of the turbo decoder.