In transmitting communication signals, in a wireless communication network for example, it is often necessary to encode the data to be transmitted. The data is then decoded when it is received. Encoded data is often interleaved according to a predetermined algorithm. In order to allow for different vendors to provide equipment to operate in a communication network, standards are often developed which define an encoding scheme, and in particular, requirements for interleaving data to enable the encoding. For example, the 3rd Generation Partnership Project (3GPP) was formed to produce Technical Specifications and Technical Reports for a 3rd Generation Mobile System based on radio access technologies.
One standard developed by 3GPP is TS 25.212 which relates to multiplexing and channel coding. The 3GPP Turbo channel code, which is a Parallel Concatenated Convolutional Code (PCCC), generates an interleaved data sequence using a Turbo code internal interleaver. The encoder defined in TS 25.212 consists of two 8-state constituent encoders and an internal interleaver. The first constituent encoder operates on the “systematic” (non-interleaved) input sequence, whereas the second operates on an interleaved version of the same sequence. In order to form the interleaved data sequence, the data are typically read from an input buffer in a permuted order according to an interleaved address sequence. The circuitry to generate these addresses is much more complex than the encoder logic, and thus forms the bulk of any Turbo encoder implementation. If this interleaver is not efficient, it may become the critical path and cause system performance to be unnecessarily reduced.
As described in Section 4.2.3.2.3 of TS 25.212 V3.2.0 Technical Specification (2000-03), Turbo coding proceeds in three stages. First, a set of matrix parameters are chosen according to the block size supplied. Then, a number of sequence vectors are initialized according to those parameters. Finally, the systematic data bits are entered into a matrix with padding, permuted according to the pre-computed sequence vectors, and then extracted transposed from the matrix with pruning. This calculation is usually re-cast in terms of the production of an interleaved address sequence, rather than operating on a physical binary-valued matrix. The algorithm for Turbo coding makes heavy use of modular arithmetic with a variable modulus, which places a heavy burden on both hardware and software implementations.
Conventional approaches to the first stage requiring parameter determination implement the equations of TS 25.212 directly. That is, the equations are implemented using hard-wired constant comparators, a lookup table of prime numbers and multiply and/or divide logic, as well as simpler elements such as adders, subtractors and counters. The branching nature of the parameter calculations makes pipelining and resource sharing difficult for the hardware designer, resulting in large, complex, slow-running designs. Further, the final stage, which is the address sequence generator itself, is described in the TS 25.212 V3.2.0 Technical Specification as a repeated indexing into a 256-element vector which is computed in the middle stage. Direct implementation of this method requires memory such as a Random Access Memory (RAM) to hold the vector, circuitry and time to perform the pre-computation, and then extra logic to perform the modulo operation on the index expression.
One difficult aspect of implementing the 3GPP Turbo code interleaver is performing a modulo operation. More specifically, the requirement to multiply an 8-bit number by a 9-bit number, modulo one of 52 9-bit primes provides a particular challenge. Various approaches can be taken. For example, the modulo operation could be performed by (i) multiplication followed by repeated subtraction/comparison, (ii) multiplication by repeated modular addition, (iii) multiplication followed by division (keeping remainder), (iv) multiplication followed by division (keeping quotient), re-multiplication and subtraction, (v) re-computation of partial modulo result, then multiplication followed by some additive reduction; or (vi) some custom “modulo multiplier” circuit. The first two of these options, which will take longer than 5 cycles, are almost certainly ruled out because of latency. The third may be possible with a sophisticated divider, but this would likely require significant area for the circuit. While option five works, it is rather slow and requires a register file for pre-computed results. Finally, option six provides no easy answer when the modulus can vary dynamically, and would be costly to implement. Accordingly, option four is the most attractive because the division can then be done using reciprocal multiplication. Although there will be three multiplications in the feedback path, they are relatively simple to implement in hardware.
By using any of these methods to implement the feedback path in 5 cycles, the circuit will run quite slowly. Alternatively, the 5 cycle requirement can be eliminated, and a faster, pipelined design, with a constraint of 10 cycles, for example, can be implemented. This gives a performance penalty for small block sizes, but works very well for a block size of 160 or over. Accordingly, there is a need for a method of and circuit for interleaving data in a data decoder which overcomes the deficiencies of conventional circuits described above.