Transmitted signals become corrupted by noise due to a variety of factors, for example background noise, noise introduced through transmitter and receiver components, noise introduced through atmospheric transmission conditions, and interference introduced from other transmitters operating interfering carrier frequencies. One approach to solving this problem is to encode the signals with redundant bits of information according to known encoding algorithms prior to transmission and then reconstruct of the signal bit stream using a decoder and error correction logic.
On receipt of the coded signals, the decoder logic reconstructs the parts of a signal that have been irretrievably corrupted due to noise or interference, and further reconstructs the original signal from the redundant information contained in the coding. One such forward error correction system comprises a convolutional coder and a Viterbi decoder. A more recent approach to error correction has evolved around a set of concatenated recursive codes known as “turbo codes”.
Due to its outstanding coding gain turbo codes are widely used for channel coding applications, particularly in wireless networks. Turbo decoding is accomplished based on an iterative algorithm. Because of its iterative nature, however, turbo decoding may require a large number of iterations under certain noisy channel conditions. Furthermore, defining the stop condition for these iterations can be complicated in the absence of an explicit error detection mechanism incorporated into turbo codes.
Turbo encoding requires multiple encoding steps where the output of constituent, different encoders is concatenated. Concatenated error correction coding is a sequence of coding in which at least two encoding steps are performed on a data stream. Concatenated coding may be performed in series (i.e., the first encoding is further encoded in a serial fashion) or in parallel (where the original data is subjected to different encoding schemes, run in parallel). The parallel concatenated data are then further processed and combined into a serial stream. The concatenated, encoded output is transmitted to a receiving processor where it is decoded. In the transmission, the encoded data block is subject to interference and error introduction, hence the requirement of an error recovery coding scheme.
A parallel concatenated turbo coding scheme starts with a block of data, comprising multiple data frames, that is encoded with a particular coding scheme resulting in systematic bits and parity bits. Additionally, the original block of data may be rearranged with a permuter. The bits are permuted (re-ordered) so that interference (noise) does not affect adjacent bits in their normal order. This scheme of spreading normally adjacent bits enhances the ability to recover from interference distortions.
The permuted bits are then encoded with the same encoding scheme as that applied to the original data resulting in systematic bits (which may be discarded) and parity bits. The two sets of encoded data are then further processed and merged (interleaved) into a serial bit stream. The complexity of parallel concatenated coding depends on the chosen encoding scheme, and can become significantly complex.
Typical turbo codes employed in communications systems are based on a parallelly concatenated constituent coding (PCCC) scheme. A typical turbo encoder with rate 1/3 is illustrated in FIG. 1. In this scheme two systematic convolutional encoders, outer encoder 10 and inner encoder 20, are parallel concatenated via turbo interleaver 30. In this example a convolutional encoder of constraint length 4 is used as a constituent encoder. The coding rate of the example shown is ⅓. Systematic information from the outer encoder in totality is represented as xt0(k) 12. The outer encoder also generates informational bits, yt0(k) 14. Output from the inner encoder is represented as xt1(k′), where k′ is an interleaved index, and yt1(k) (information bits) 24.
Turbo decoding is accomplished by employing two constituent decoders, the outer decoder and the inner decoder, that generate log-likelihood ratios (LLR) called “extrinsic information” and by feeding back the extrinsic information from one decoder to the other decoder iteratively. A functional block diagram of a turbo decoder is illustrated in FIG. 2 where x(k) 212, y0(k) 214, and y1(k) 224 represent received samples of the encoder outputs, xt0(k) 12, yt0(k) 14, and yt1(k) 24, respectively (See FIG. 1). As illustrated in FIG. 2, the outer decoder takes on received samples, x(k) 212 and y0(k) 214, and extrinsic information, e(k) 216, generated by the inner decoder 220 where k denotes the symbol index. Similarly, the inner decoder takes on receive samples, x(k′) 222 and y1(k) 224, and extrinsic information, e(k′) 226, generated by the outer decoder 210 where k′ denotes the interleaved symbol index. Each time a constituent decoder is run, the extrinsic information is updated for the other decoder and the decoder performance gets enhanced iteratively. One iteration is completed when a single pass of decoding is performed for both the outer decoder 210 and the inner decoder 220. In this implementation, one pass of decoding requires memory accesses to N symbol data either in normal or interleaved order. That is, each pass of decoding requires at least N memory access clocks. The iterative processes ends when a termination state is reached (for example, an acceptable BER has been obtained) or when successive iterations produce little or no improvement in the BER (the process has “converged”). At this point the sample, which is a gray scale value, is passed through hard decision logic 228 where values <0.5 are given a value of 0 and values >0.5 are given a value of 1 resulting in the value of XH(k) 230.
In order for the decoder processor to decode the encoded input data at the same rate as the input data is arriving, the component decoder processor must process the encoded data at a rate at least as fast as the rate of the incoming data. With iterative decoding, the speed of the decoder processor becomes a significantly limiting factor in the system design. Schemes to accelerate the decoding process include accelerating the decoder and accelerating convergence rate and the recognition of the decoding terminating event. The present invention is concerned with accelerating the convergence rate as well as the recognition of the decoding terminating event.
U.S. Pat. No. 6,182,261 to Haller et al. entitled “Effective Iterative Decoding”, describes decoding that uses multiple processors to decode Turbo code in parallel and logic that accelerates recognition of a termination condition. The termination recognition schemes are applied to each processor individually. The logic includes: performing CRC and terminating when the check sum variance gets below some threshold; terminating when the probability that the estimated bit value accuracy has reached a threshold level; terminating when a bit error rate (BER) level is sufficiently low; and several other schemes. The parallel decoding process continues iteratively until the termination logic determines completion of the packet's decoding.
Other termination recognition logic described in Haller include: stopping when a maximum number of iterations have been performed; stopping when output data has consumed some level of storage capacity; and, stopping when the results for the current iteration matches the results for the last prior iteration or matches the results for the last prior two iterations. The latter logic, which is performed only after a minimum number of iterations have been performed, recognizes a termination condition if successive iterations yield the same result, thus concluding that a convergence of the iteration results has been achieved. This type of method is referred to as accelerated convergence rate (ACR) logic.
U.S. Pat. No. 6,298,463 to Bingeman et al., entitled “Parallel Concatenated Convolutional Coding”, describes an encoding and decoding scheme that process a group of N bits (as opposed to coding single bits). One embodiment of Bingeman uses a parity bit generator that adds a parity bit to the group of bits encoded, thus allowing the decoder to check parity on small grouping of bits. The preferred embodiment sets N at 2 or 3 bits. Acceleration of termination recognition occurs through use of parity bit checking of a small grouping of bits. This scheme is different than performing CRC, which is a check for the sum of the bits for an entire data frame.
U.S. Pat. No. 6,252,917 to Freeman, entitled “Statistically Multiplexed Turbo Code Decoder”, describes a system where a plurality of decoders report the progress of bit error rate convergence of their respective signals being decoded to a centralized scheduler. The scheduler can then analyze each decoder for a terminating condition where the bit error rate has ceased to improve sufficiently to justify continued iterations. Effectively, this patent centralizes some of the decision logic to a scheduler as opposed to a processor that performs the decoding, but does not accelerate recognition of some terminating condition.
U.S. Pat. No. 6,014,411 to Wang (Wang '411), entitled: “Repetitive Turbo Coding Communication Method”, describes a method for reduction in error rate of turbo coding (encoding and decoding). Two approaches are described: 1) A backward recursive decoding from all possible end node states so as to eliminate tail bit errors; 2) The use of bit partitioning during encoding by repetitive coding so as to increase Hamming distance and thus increase coding gain. The first aspect provides a turbo decoding method for backward recursion starting from all possible states of a bit as represented on a trellis. The method is applied when the tail bits of a turbo encoded data frame are flushed and parity bits for the flushed bits are added to the encoded data block. In typical turbo code implementations tail bits or flush bits of a turbo code block are generated separately at the end of data bit encoding in such a way that the code trellis terminates at all-zero state. Normally data interleaving for turbo coding is limited to data bits and tail bits for the second encoder are generated independently of tail bits of the first encoder. The method proposed in Wang '411 patent (and the related patents discussed below) performs data interleaving over data and tail bits of the first encoder and does not force the final state of the code trellis of the second encoder to all-zero state.
Two additional patents to Wang, U.S. Pat. Nos. 6,028,897 (Wang '897) and 6,044,116 (Wang '116), are inter-related to Wang '411. The Wang '897 patent abstract describes a backward recursive method to determine an ending state and further includes in the abstract that the tail bits are not flushed. The '411 patent abstract includes a discussion of repetitive encoding and addresses appending parity bits for the tail bit sequence.
The Wang '116 patent description abstract describes a non-flushing method. Further, Wang '116 claims repetitive encoding but not a repetitive systematic data sequence, nor does it describe a sequence of parity tail bits nor flushing tail bits.
The Haller patent and Wang patents, as noted, are examples of logic that attempt to accelerate the convergence rate. The Haller patent does this by recognizing that if the last two (or three) decode iterations yield the same result, it is likely (but not certain) that convergence has occurred. The Wang patents focus on finding a concluding node state on a trellis that represents the actual bit state. This accelerates convergence analysis if the final condition is known before the iteration process starts, thus avoiding analysis that leads down an incorrect path.
While various embodiments of ACR logic have been disclosed in the prior art, these embodiments of ACR logic are not fully optimized. What is desired is ACR logic that accelerates the convergence rate by using knowledge of error-free data frames to identify nodes in a trellis (a “bound node”) that are entered by a data bit from such error free frame (a “verified bit”) thus avoiding going down incorrect paths from interior node points. It is further desirable to be able to use ACR logic when less than a full data frame is included in the encoded code block.