Radio frequency (RF) digital data transmissions may be corrupted by a wide variety of interferences sources. For example, sources of RF distortion may include RF signals emitted by natural and man-made RF sources as well as multipath sources of RF distortion created by the transmitted signal itself as portions of the transmitted signal reflect off physical objects along a transmission path. Such RF signals create background noise from which the original RF transmission must be extracted, and/or may constructively, and/or destructively, interfere with the original signal. The impact of such RF distortion on a digital data transmission embedded within an RF signal may be severe, especially when a received RF signal is weak, i.e., a received signal has a low signal-to-noise ratio.
Turbo coding of an outgoing digital data stream is one technique that may be used to mitigate the effect of RF distortion on a digital data transmission embedded within an RF signal. For example, emerging communications standards, e.g., 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) and LTE advanced standards, require that compliant transceivers apply turbo coding to an outgoing data packet prior to transmission.
A turbo encoder, included in a transmitting device, may include two recursive systematic convolutional (RSC) encoders. The first RSC encoder may take as input a data block containing an ordered set of bits in the original data block bit order, the second RSC encoder may take as input bits from the same data block after the data block has been passed through a turbo interleaver, π. The turbo interleaver, π, is a key component in the turbo code design. It is responsible for scrambling the input block in a pseudo-random fashion, thus providing an interleaved data block with good weight distribution, and hence, characteristics that support error-correcting.
The turbo encoder output mandated by 3GPP LTE standards includes three subblocks. A first subblock includes systematic bits, each systematic bit corresponding to a bit in the original data block received by the turbo encoder, a second subblock includes parity bits generated by the first RSC encoder engine within the turbo encoder that processes data bits in original order, and a third subblock includes parity bits generated by the second RSC encoder engine within the turbo encoder that processes data bits in an interleaved order.
A conventional turbo decoder, included within a receiving device, may include two RSC decoders, each corresponding to the two RSC encoders of the turbo encoder, addressed above. The first RSC decoder may take as input the systematic bits and the parity bits produced by the first RSC encoder. The second RSC decoder may take as input the systematic bits in an interleaved order, as determined by a decoder interleaver that uses same turbo interleaver, π, and the parity bits produced by the second RSC encoder. In each iteration of the decoding process, each RSC decoder may output an improved estimate, e.g., extrinsic data in the form of a log-likelihood ratio (LLR), of the actual bit value represented by each systematic bit. Once the estimates generated by the two RSC decoders converge, or once a predetermined number of decoding cycles has been performed, the final improved estimates may be interpreted, and transmitted from the decoder to a receiver signal processor as an output stream of decoded bit estimates.
The data bit-rates required by emerging communications standards, such as 3GPP LTE, may reach over 100 Mbit/sec. For example, the multiplexing and channel coding standard adopted as part of the 3GPP LTE standards, e.g., 3GPP Technical Specification (TS) 36.212, allows data packets that may be one of 188 different sizes, ranging between 40-bit and 6144-bit, packages. Turbo decoder designs configured to support such high data rates typically include dedicated hardware that supports parallel processing. Such designs may include multiple Bahl, Cocke, Jelinek and Raviv (BCJR) decoders, or BCJR engines, operating in parallel to process systematic bits contained in a common memory to produce iteratively improved bit estimates, as described above.
A quadratic permutation polynomial (QPP) interleaver scheme was defined by the 3GPP LTE standard to allow a hardware architecture to use a common memory shared by a number of BCJR processors without memory access conflicts.
The turbo-decoding algorithm consists of multiple iterations, each of which consists of a non-interleaved half-iteration, followed by an interleaved half-iteration. Each half-iteration includes a beta scan, in which the systematic bits are processed in a reverse order, i.e., from last to first, followed by an alpha scan, in which the systematic bits are processed in a forward order, i.e., from first to last. According to the QPP interleaving approach, a data packet received by a receiver may be stored as a two-dimensional array, with a number of rows, w, and a number of columns, b. For example, a QPP turbo interleaver, π, may multiplex rows, w, in a pseudo-random manner for each half-iteration, and may multiplex columns between BCJR engines in a pseudo-random manner that changes every scan cycle.
According to the QPP approach, the respective BCJR engines are synchronized, each BCJR engine processing only data stored at one column address, λ, within the row identified by row address Ψ during each scan cycle. The QPP function guarantees that multiple BCJR engines may access, during each scan cycle, the row/column data that each requires, free of memory access conflicts. However, the straight forward approach for implementing a QPP row and column address generator, or QPP interleaver, capable of providing service to all BCJR engines requires a huge amount of logic that consumes silicon area. This is because, for each BCJR processor, each scan cycle, the QPP function requires calculation of a new row/column address based on equation 1, presented below.I(x+b*w)=[K1*(x+b*w)+K2*(x+b*w)*(x+b*w)]% K  Eq. 1
Where K1, K2, and K can be large integers.
Such a straight forward implementation, e.g., in hardware on an integrated circuit chip, would require at least 4 multipliers and one divider for each BCJR processor, resulting in a QPP turbo decoder integrated circuit with a large integrated circuit footprint, increased power consumption, increased heat generation and reduce response time.