In communications systems, signals transmitted, wirelessly for example, may be subjected to fading, jamming, and other elements that may cause errors to be introduced in the signal. The coding of signals before transmission helps to overcome the effects of channel noise, fading, and jamming, by allowing errors introduced during the transmission to be detected and corrected when the signal is decoded at a receiver.
“Turbo codes” have been recognized as a breakthrough in coding schemes and provide powerful resistance to errors generated during transmission. They can be implemented as parallel concatenated convolutional codes (PCCC) or serial concatenated convolutional codes (SCCC). Turbo codes provide high coding gains and bit error rates as low as 10−7. Turbo codes provide outstanding error correction and so are very useful in applications where the signal-to-noise ratio (SNR) is generally low (e.g., wireless communications).
An example of a conventional turbo encoder is shown in FIG. 1. The turbo encoder 100 receives a signal 102 that is passed to a first recursive systematic convolutional (RSC) encoder 104 and, via an interleaver 106 to a second convolutional encoder 108. The two convolutional encoders provide the component codes of a turbo code. The interleaver 106 changes the order of the data stream before it is input to the second convolutional encoder and, because one data stream is interleaved, the resulting code has time-variant characteristics that provide for the high coding gains obtained from turbo coders. The encoded signal 110 is modulated and transmitted over a communication channel.
An example of a conventional turbo decoder is shown in FIG. 2. The turbo decoder 200 receives a demodulated signal 202 from a communication channel. The signal 202 is passed to a first soft-input, soft output (SISO) decoder 204 and, via an interleaver 206, to a second SISO decoder 208. The second SISO decoder 208 also receives a component of the signal 202. The output of the first SISO decoder 204 is passed via interleaver 210 to the second SISO decoder 208, and the output of the second SISO decoder is passed via a de-interleaver 212 to the first SISO decoder 204, so as to enable iterative decoding. In operation, an incoming block of data (also called a data frame) is processed once and then recirculated several times to achieve a desired coding gain. Although turbo codes exhibit high resistance to errors, they are not ideally suited for many practical applications because of an inordinately high latency that is a result of the turbo encoder's use of interleavers (which introduce delay) and the turbo decoder's iterative algorithm which is computationally complex. Turbo codes usually work with large block sizes (e.g., >5000 bits). The soft inputs for an entire block must be stored in a memory in order to facilitate the iterative decoding. In other words, the soft inputs will be repetitively used and updated in each decoding phase. As a result, turbo decoders are memory intensive, which may render them impractical or too expensive for some applications.
In general, latency of serial turbo decoders may be marginally improved by using specially designed high-speed hardware to implement the turbo decoders; however, only incremental improvement in latency is provided at the cost of increased expense and device complexity, in addition to increased power dissipation (which may be unacceptable in many low power wireless devices).
An alternative approach to overcoming the high latency of turbo decoding is to use parallel decoding architectures. Parallel decoding can greatly improve throughput and latency. Two basic parallel schemes are available. Parallelism may be achieved by decoding multiple received signals at the same time or by dividing a received signal block into sub-blocks and decoding the sub-blocks in parallel by multiple parallel processors. While throughput and latency may be reduced using parallel decoding, the large memory requirement is not. In addition, hardware complexity and cost also are increased. Therefore, parallel schemes that are memory efficient and hardware (or area) efficient are needed for practical implementation of turbo codes.
One problem with parallel operation is that of memory access. In particular, the presence of an interleaver means that memory must be addressed out-of-order by multiple parallel processors. Memory contentions arise when two or more processors require read or write access to the same memory on the same clock cycle. A certain class of contention-free (CF) interleavers eliminates memory contentions.
The quadratic permutation polynomial (QPP) turbo interleaver, which has been adopted in the Long Term Evolution (LTE) standard, is a CF interleaver. Due to the high data rates required in LTE systems, the turbo decoder will need to employ parallel decoding using multiple processors. Therefore, there is a need to apply the QPP to a multi-processor turbo decoder.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.