Over the past two decades, wireless communication has been revolutionized by channel codes that benefit from iterative decoding algorithms. For example, the Long Term Evolution (LTE) [1] and WiMAX [2] cellular telephony standards employ turbo codes [3], which comprise a concatenation of two convolutional codes. Conventionally, the Logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) algorithm [4] is employed for the iterative decoding of the Markov chains that are imposed upon the encoded bits by these convolutional codes. Meanwhile, the WiFi standard for Wireless Local Area Networks (WLANs) [5] has adopted Low Density Parity Check (LDPC) codes [6], which may operate on the basis of the min-sum algorithm [7]. Owing to their strong error correction capability, these sophisticated channel codes have facilitated reliable communication at transmission throughputs that closely approach the capacity of the wireless channel. However, the achievable transmission throughput is limited by the processing throughput of the iterative decoding algorithm, if real-time operation is required. Furthermore, the iterative decoding algorithm's processing latency imposes a limit upon the end-to-end latency. This is particularly relevant, since multi-gigabit transmission throughputs and ultra-low end-to-end latencies can be expected to be targets for next-generation wireless communication standards [8]. Therefore, there is a demand for iterative decoding algorithms having improved processing throughputs and lower processing latencies. Owing to the inherent parallelism of the min-sum algorithm, it may be operated in a fully-parallel manner, facilitating LDPC decoders having processing throughputs of up to 16.2 Gbit/s [9]. By contrast, the processing throughput of state-of-the-art turbo decoders [10] is limited to 2.15 Gbit/s. This may be attributed to the inherently serial nature of the Log-BCJR algorithm, which is imposed by the data dependencies of its forward and backward recursions [4]. More specifically, the turbo-encoded bits generated by each of typically two convolutional encoders must be processed serially, spread over numerous consecutive time periods, which are clock cycles in a practical integrated circuit implementation. Furthermore, the Log-BCJR algorithm is typically applied to the two convolutional codes alternately, until a sufficient number of decoding iterations have been performed. As a result, thousands of time periods are required to complete the iterative decoding process of the state-of-the-art turbo decoder.
Accordingly, providing an alternative to the Log-BCJR decoder, which has fewer data dependencies and which enables fully parallel processing represents a technical problem.