1. Field of the Invention
This invention relates generally to data processing, and more particularly to the processing of algorithms in software that benefit from the efficient implementation of forward and backward butterfly operations used, for example, in Maximum a posteriori (MAP) decoding. Such exemplary MAP decoding is used in the processing of parallel concatenated codes (Turbo codes) and serial concatenated codes.
2. Description of Related Technology
Parallel and serial concatenated codes are formed from a data sequence that is concatenated with a sequence of output bits from two or more constituent encoders, e.g., convolutional encoders. Turbo codes correspond a specific type of parallel concatenated code. However, within this application, it is to be understood that where applicable, discussions referring to “Turbo codes” can be extended more generally to both parallel and serial concatenated codes. Embodiments involving parallel concatenated codes and more specifically Turbo codes are developed herein by way of example only.
The use of Turbo codes for transmission of data over a noisy channel was first introduced in C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo codes”, Proc. of 1993 Int. Conf. Comm., pp. 1064–1070. This reference is referred to as the “Berrou reference” hereinafter. Turbo codes provide bit error rates near Shannon's theoretical limit but add significant complexity to the receiver's decoder. Turbo codes are used for forward error correction in several important communication standards such as, inter alia, third-generation partnership project (hereafter, 3GPP) cellular communications standards. Consequently much effort has been applied to develop efficient Turbo decoder implementations.
MAP (maximum a posteriori) based decoders are widely used within Turbo decoder implementations and require significant data processing. A MAP decoder determines a sequence that minimizes a symbol error rate as opposed to finding a maximum-likelihood sequence as determined using the more common Viterbi algorithm. The MAP decoder algorithm is described in Bahl, L. R. et al., “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate, ” IEEE Transactions on Information Theory, March 1974, pp. 284–287, hereinafter called the “Bahl reference.” The MAP decoder described in the Bahl reference is often called the “BCJR algorithm” in recognition of its authors. While the MAP decoder is more costly than the Viterbi algorithm, it provides an information sequence known as an extrinsic sequence that is needed by Turbo decoders. Two MAP decoders configured in a feedback configuration are employed within a Turbo decoder. The processing associated with MAP decoders accounts for the bulk of the computational load in Turbo decoding.
Most practical implementations perform computations using logarithmic representations of probability information and are known as Log-MAP decoders. A decoder known as the Max-Log-MAP decoder uses a mathematical approximation to simplify the calculations involved and to thereby reduce the overall complexity of the decoder. Max-Log-MAP decoders are discussed in “Efficient Software Implementation of the Max-Log-MAP Turbo decoder on the StarCore SC140 DSP”, A. Chass, A. Gubeskys, and G. Kutz, ICSPAT 2000 and Motorola Application Note, hereinafter referred to as the “Chass reference.” The Max-Log-MAP decoder performance is slightly reduced compared to the Log-MAP but is more commonly implemented due its decreased computational complexity. The Max-Log-MAP decoder performance can be improved by the addition of a correction term. A Max-Log-MAP decoder that makes use of this correction is known as a Max*-Log-MAP decoder. Max*-Log-MAP decoders are discussed in Michel, H. and When, N. “Turbo-Decoder Quantization for UMTS” IEEE Communications letters, Vol. 5, Number 2, February 2001, hereinafter called the Michel reference. The exemplary embodiment of the invention performs efficient Max*-Log-MAP decoding in software using of a customized processor designed to efficiently implement operations involved in various MAP decoding algorithms. Most of the computational operations required to perform MAP based decoding involve forward (alpha) metric updates, backward (beta) metric updates and the Log Likelihood Ratio (hereafter, LLR) calculations.
FIG. 1 illustrates a prior art block diagram of a rate ⅓ Turbo encoder 100 as used in a transmitting device. An input data sequence u(k) 101 (typically binary valued) is directly coupled to an output coupling 103 to produce a systematic data subsequence x(k) (i.e., x(k)=u(k)). The input sequence u(k) is also coupled to a first convolutional encoder 105 to produce a first parity information subsequence y1(k) 107. The input sequence u(k) is also coupled to a pseudo random interleaver 109 whose output is coupled to a second convolutional encoder 111 to produce a second parity information subsequence y2(k) 113. The output of the rate ⅓ Turbo encoder 100 is a sequence containing the three subsequences x(k), y1(k), and y2(k).
The Turbo encoder of FIG. 1 involves relatively simple logic processing and is usually implemented using Finite State Machine (FSM) controlled hardware. The encoded data stream is transmitted over a noisy channel and is received at a receiving device as an error-prone data stream comprising error-prone systematic and parity information subsequences. A Turbo decoder is used to operate on the received error-prone subsequences in order to produce an error-corrected estimate of input data sequence, u(k).
In many embodiments a rate ½ Turbo decoder is used instead of the aforementioned rate ⅓ Turbo decoder. The rate ½ Turbo decoder discards every other element of the subsequences y1(k) 107, and y2(k) 113, so that the encoder's output sequence contains one parity bit for each systematic bit. This process of decimating the parity sequences is known to those skilled in the art as “puncturing.”
A Turbo decoder 200 designed according to the most commonly employed Turbo decoding scheme is shown in FIG. 2. At the Turbo decoder 200, the input data subsequences correspond to error-prone versions of the transmitted subsequences. This is because the Turbo decoder generally only has access to the transmitted information after it has been received through a noisy channel. The received error-prone data subsequence x(k) 202, and the received error-prone first parity subsequence y1(k) 204 are coupled into a first Soft Input Soft Output (SISO) MAP decoder 206. Also coupled into the first MAP decoder 206 is a feedback sequence involving a priori log likelihood information, λin(k), output from a deinterleaver 208. The output from the first SISO MAP decoder 206, λout(k) 207, is coupled to an interleaver 210 which generates a set of a priori information that is coupled to a second SISO MAP decoder 212. The second SISO MAP decoder 212 also takes as input an error-prone parity data subsequence y2(k) 214 and the error-prone systematic data x(k) 202 after passing through an interleaver 216. As is known in the art, the deinterleaver 208, and the interleavers 210 and 216 use the same interleaving function as used in the encoder 100. The output of the second SISO MAP decoder 212 is a second log likelihood data output sequence, λout(k) 213. The sequence λout(k) 213, like the other data sequences, includes a corresponding element for each bit index k into the input data block. The number k preferably ranges from 0 to N−1, so that there are N elements in each data block. After the data block is operated upon via several iterations through the decoder 200, a hard decision output data element 218 can be produced with low Bit Error Rate (BER).
A summary of the calculations involved in a SISO MAP decoder for a version of the popular Max*-Log-MAP algorithm is provided in the detailed description of the invention. Also refer to the Berrou, Michel and Chass references for further details regarding the Turbo decoder and its implementation. The Turbo decoder of FIG. 2 is well known to involve a significant computational load. When turbo decoding is performed using logarithmic values, the computational load involves accessing the many data values required, data selection, add-compare-select operations, correction factor computations and nontrivial pointer arithmetic.
The combination of computational complexity and the need for power efficient solutions has lead to prior art solutions involving one or more processors coupled to a hardware Turbo decoder. An exemplary prior art communications device 300 is shown in FIG. 3. The communications device 300 may represent, for example, a cellular phone, a wireless basestation, a modem or any other communications device that applies error correction processing to a received signal. The communications device 300 includes a Turbo decoder hardware module 302 and a private memory 304 coupled thereto. The Turbo decoder 302 is coupled to receive information from a communication interface 306. The communication interface 306 generally corresponds to a receiver that provides a demodulated bit stream received from a communication channel 308. The communication channel 308 may be a wireless, wireline, optical, or other type of communication channel.
The Turbo decoder 302 is coupled to a digital signal processor (DSP) 310. The DSP 310 typically is coupled to a private memory 312, for example, on-board memory associated with the DSP 310. The communication device 300 also typically includes a microcontroller 314. While the DSP 310 handles physical layer processing tasks, the microcontroller 314 typically handles link layer and other upper layer processing. In this exemplary prior art system, the DSP 310, the microcontroller 314, and the Turbo decoder 302 are coupled together via a system bus 316. Also coupled to the system bus 316 are a memory module 318, a memory module 320, and an input/output device 322. In some systems, the memories 318 and 320 are merged into a single memory module.
In operation, a communication signal is received from the communication channel 308. The communication signal is then converted by the interface circuitry 306 into a digital data sequence. The received digital data sequence consists of error-prone systematic and parity data. The microcontroller 314 is typically used to write this information to the memory 318. The Turbo decoder 302 then reads a block of the data sequence from the memory 318 and performs Turbo decoding to convert the error-prone data block into an error-corrected data sequence. At the end of the iterative decode process the data is written by the Turbo decoder into the memory 320.
In some embodiments, the DSP 310 performs signal conditioning such as equalization prior to sending the data block to the Turbo decoder. Also, the DSP 310 may also perform baseband processing such as Viterbi Algorithm decoding and speech codec functions. The decoded data from the Turbo decoder will typically be further processed by the microcontroller 314 with its associated memory subsystem 320 before being passed to the data Input/Output logic 322 of the system.
The reason prior art systems use a dedicated hardware Turbo decoder 302 is because it is generally costly and inefficient to implement such a high complexity algorithm in software on a general purpose DSP. For example, each SISO MAP decoder involves branch metric calculations (gamma metrics), a forward recursion through the trellis (alpha metric calculations), a backward recursion through the trellis (beta metric calculations), a soft output calculation and an extrinsic information (LLR) calculation. The Chass reference reports a DSP software implementation of the decoder, but the implementation results in a costly and power consuming solution. This is because general purpose DSP's require many instruction cycles to implement all of the aforementioned operations and the supporting pointer arithmetic to control memory accessing.
While prior art Turbo decoding solutions have been proposed, they have some limiting problems that need to be overcome. For example, Hardware decoders lack flexibility. A change in a standard, a new standard, or any other change in a specification or requirements is difficult to handle when the controlling algorithms are not software programmable. Also, Hardware decoders lack advanced programmable features. Because of this limitation, hardware decoders tend to not have certain features that would be easy to add to a software programmable decoder. Another problem is that hardware decoders consume gates and memory that will not be reused by other functions. The silicon area consumed by a hardware decoder will not be used for other functions whereas the silicon area used to support a software decoder in a DSP can be reused for functions such as speech and audio decompression/decoding and speech recognition. As discussed above, DSP software based implementations are inefficient. To implement a Turbo decoder in DSP software is overly costly in both instructions per second and power consumption. Hence there is a trade off in the prior art between efficient but fixed hardware decoders and inefficient but flexible software decoders.
Based on the foregoing, there is a need for an improved decoding architecture that provides efficiency similar to that of a hardware decoder while still providing the flexibility of a software-implemented decoder. It would be desirable for such a decoder to be reprogrammable and thereby able to deal with new requirements and/or to accommodate a new standard. There is also a need for an improved decoder architecture that could be readily programmed to support advanced features. It would be desirable to have a decoder architecture that could be reused for other functions such as speech and audio encoding/decoding and speech recognition. It would also be desirable to have a programmable and reusable decoder architecture that is tightly coupled to a processor such as a DSP and allows Turbo decoding to be performed using much fewer processor cycles and/or much less power than prior art DSP software-based approaches. There is a need to eliminate the trade off in the prior art between efficiency and programmability of Turbo decoding structures.