Transmission of digital data is inherently prone to interference, which may introduce errors into the transmitted data. Error detection schemes have been suggested to determine as reliably as possible whether errors have been introduced into the transmitted data. For example, it is common to transmit the data in packets, and add to each packet a CRC (cyclic redundancy check) field, for example of a length of 16 bits, which carries a checksum of the data of the packet. When a receiver receives the data, it calculates the same checksum on the received data and verifies whether the result of its calculation is identical to the checksum in the CRC field.
Data signals, in particular those transmitted over a typically hostile RF interface, are susceptible to errors caused by interference. Various methods of error correction coding have been developed in order to minimize the adverse effects that a hostile interface has on the integrity of communicated data. This is also referred to as lowering the Bit Error Rate (BER), which is generally defined as the ratio of incorrectly received information bits to the total number of received information bits. Error correction coding generally involves representing digital data in ways designed to be robust with respect to bit errors. Error correction coding enables a communication system to recover original data from a signal that has been corrupted. Typically, the greater the expected BER of a particular communication link, the greater the complexity of the error correction coding necessary to recover the original data. In general, the greater the complexity of the error correction coding, the greater the inefficiency of the data communication. The greater inefficiency results from a reduction of the ratio of information bits to total bits communicated as the complexity of the error correction coding increases. The increased number of bits introduced into the original body of data by error correction coding consumes spectrum bandwidth and processor cycles on both the transmitting and receiving ends of the communication.
Turbo coding plays an important role in communications systems because of its outstanding coding gain with relatively manageable decoding complexity. Typical turbo codes employed in communications systems are based on a parallel concatenated constituent coding (PCCC) scheme. An example of a turbo encoder with rate 1/3 is illustrated in FIG. 1. In this scheme two systematic convolutional encoders, an outer encoder 10 and an inner encoder 20, are parallel concatenated via a turbo interleaver 30. In this example a convolutional encoder of constraint length 4 is used as a constituent encoder. The coding rate of the example shown is 1/3. Systematic information from the outer encoder in totality is represented as xt0(k) 12. The outer encoder also generates informational bits, yt0(k) 14. Output from the inner encoder is represented as xt1(k′), where k′ is an interleaved index, and yt1(k) (information bits) 24.
Concatenated error correction coding is a sequence of coding in which at least two encoding steps are performed on a data stream. Concatenated coding may be performed in series (i.e., the first encoding is further encoded in a serial fashion) or in parallel. Parallel encoding subjects the original data to different encoding schemes resulting in intermediate codes that are then further processed and combined into a serial stream.
A parallel concatenated turbo coding scheme starts with a block of data that is encoded with a particular coding method resulting in systematic bits and parity bits. Additionally, the original block of data may be rearranged with a permuter. The bits are permuted (re-ordered) so that interference (noise) does not affect adjacent bits in their normal order. This scheme of spreading normally adjacent bits enhances the ability to recover from interference distortions.
The permuted bits are then encoded with the same method as that applied to the original data resulting in systematic bits (which may be discarded) and parity bits. The two sets of encoded data are then further processed and merged (interleaved) into a serial bit stream. The complexity of parallel concatenated coding depends on the chosen encoding scheme and can become significantly complex.
The amount of processing necessary to decode such convolutionally encoded data can be considerable. Parallel and serial concatenated codes are sometimes decoded using iterative decoding algorithms. One commonly employed method of iterative decoding utilizes a single decoder processor where the decoder output metrics are fed back to the input of the decoder processor. Decoding is performed in an iterative fashion until a terminating condition has been reached. A primary example is a turbo decoder.
Turbo decoding is accomplished by employing two constituent decoders. The outer decoder and inner decoder generate log-likelihood ratios (LLR) called extrinsic information. The extrinsic information is fed back from one decoder to the other iteratively. A functional block diagram of a turbo decoder is illustrated in FIG. 2 where x(k) 212, y0(k) 214, and y1(k) 224 represent received samples of the encoder outputs, xt0(k) 12, yt0(k) 14, and yt1(k) 24, respectively. As illustrated in FIG. 2, the outer decoder takes on received samples, x(k) 212 and y0(k) 214, and extrinsic information, e(k) 216, generated by the inner decoder 220 where k denotes the symbol index. Similarly, the inner decoder takes on receive samples, x(k′) 222 and y1(k) 224, and extrinsic information, e(k′) 226, generated by the outer decoder 210 where k′ denotes the interleaved symbol index. Each time a constituent decoder is run, the extrinsic information is updated for the other decoder and the decoder performance gets enhanced iteratively. One iteration is completed when a single pass of decoding is performed for both the outer decoder 210 and the inner decoder 220. In this implementation, one pass of decoding requires memory accesses to N symbol data either in normal or interleaved order. That is, each pass of decoding requires at least N memory access clocks. The output of the decoder is passed through hard—decision logic 228. Input values to hard-decision logic are grey-scale values, i.e. a value somewhere between 0 and 1. The hard-decision logic converts it to 0 if its value is less than 0.5, to 1 and provides this value as output 230.
A typical implementation of a turbo decoder is illustrated in FIG. 3, where the turbo decoding logic is shared between the two decoding modes, outer decoding mode 310 and inner decoding mode 312. Switches 336 provide the data path switching between the two decoding modes. In this implementation there are three major memory blocks associated with the input 338, output 350, and control data 372 of the turbo decoding logic are illustrated. These are input sample 338, extrinsic information 350, and interleaved address memories 372. In the preferred embodiment, the extrinsic information memory is a dual port memory and the path metric memory used during the decoding process is internal to the decoding logic. Further, a dual port memory 352 is employed for the extrinsic information 350. The interleaver address memory associates the interleaver address with an address counter 374.
In order for the decoder processor to decode the encoded input data at the same rate as the input data is arriving, the component decoder processor 340 must process the encoded data at a rate faster than the rate of the incoming data by a factor at least equal to the number of iterations necessary. With this method of iterative decoding, the speed of the decoder processor becomes a significantly limiting factor in the system design. Schemes to accelerate the decoding process include accelerating the decoder and accelerating the recognition of the decoding terminating event.
In order to improve processing times, parallel decoding schemes have been devised. One scheme is to use multiple processors to decode in parallel. U.S. Pat. No. 6,292,918 to Sindhushayana et al. entitled “Effective Iterative Decoding” (the '918 patent), describes a decoder that uses multiple processors to decode turbo code in parallel. In this approach, two decoders—an inner decoder and an outer decoder—work on two code blocks. The underlying concept is that the inner decoder processes a first code block while the outer decoder processes a second code block. Upon completion of current phases of decoding these two decoders exchange outputs and repeat the decoding process so that each code block goes through both phases of decoding, outer decoding and inner decoding. Efficiency is based on the theory that both the inner and outer decoders are fully utilized, i.e. the outer decoder does not wait until the inner decoder completes its decoding and vice versa.
U.S. Pat. No. 6,182,261 to Haller et al. entitled “Effective Iterative Decoding”, the '261 patent, a continuation in part of the application for the '918 patent, discloses a decoder scheme in which the inner and outer decoding processes can be performed independently. That is, as soon a decoder is finished processing a first block, the same decoder is able to decode a second block without having to wait for the other decoder to complete a decoding the first block of code.
U.S. Pat. No. 6,304,995 to Smith et al. entitled “Pipelined Architecture to Decode Parallel and Serial Concatenated Codes”, the '995 patent, describes a scheme for processing concatenated encoded data in a cascading fashion. This scheme allows a plurality of processors to decode in parallel, thus accelerating the decoding process similar to the concept of parallel processing using multiple processors in '918.
U.S. Pat. Nos. '918, '261 and '995 disclose multiple processors to decode in a parallel fashion. Further, each of the cited patents decodes full blocks of encoded data at each decoder. The schemes provide for complementary decoders to either process a single data block in different modes (i.e., “inner” and “outer” decoding modes) in parallel or for independent decoders to process two different data blocks in parallel. Since each of the disclosed decoders operates in parallel, the number of clock cycles used to address memory is the same for each processor/decoder. The theoretical increase in decoding speed is premised on obtaining two decoding cycles for each clock cycle. This approach, however, requires two physical decoders and two sets of memories for two code blocks.
What is desired is a turbo decoding process without the complexity associated with using discrete decoders for parallel processing but that achieves a decoding rate that is equal to, or better than, such discrete decoder parallel processing systems.