Forward error correcting (FEC) channel codes have been used to effectively improve wireless communication systems. Systems using FEC codes can tolerate more channel errors. Therefore, such coded systems may operate with lower transmit power, transmit over longer distances, tolerate more interference, use smaller antennas, and transmit at a higher data rate.
FEC codes include convolutional code, Turbo code, and Trellis code modulation (TCM). These codes have been generally adapted in multiple access technologies, ranging from 2G/3G wireless systems, satellite communication systems, IEEE 802.11/15/16, local area network (LAN), personal area network (PAN), wide area network (WAN), digital subscriber line (DSL), cable/hybrid fiber coaxial (HFC) cable, and power line multiple access. These codes are also adapted to handle data rates dynamically from a few kilobits (small packets) to hundreds of megabits (large packets).
FIG. 1 is a block diagram illustrating a prior art uplink process where a base station 200 communicates with several users at a user side 100 at the same time. The base station 200 of this architecture typically uses commercially available digital signal processor (DSP), field-programmable gate array (FPGA), application-specific specific integrated circuit (ASIC), and application-specific signal processor (ASSP) devices that can handle multiple users and/or multiple data streams at the same time. FIG. 1 illustrates three (3) users, a first user user1 may have two applications using two separate data streams data1 and data2, while second and third users user2 and user3 may each have one application using one data stream data3 and data4, respectively, for each application. At the user side 100, each data stream for each user is encoded by separate encoders 10. A first encoded data stream for the first user user1 and the second encoded data stream for the first user user1 are individually modulated by respective modulators 12, and then the modulated first and second data streams are combined by a combiner 14. The combined data stream is transmitted to the base station 200 through a channel (e.g., over the air). Each data stream for the second and third users user2 and user3 is individually modulated by a modulator 12, and individually transmitted through a channel to the base station 200. The order of processing may change depending on specific technologies (e.g., 3GPP2 follows the order illustrated in FIG. 1, while in 3GPP, the order of “combine” and “modulate” may be swapped).
At the base station 200, the combined data stream for the first user user1 is demodulated by demodulators 16 to produce soft symbols. The modulated data stream for each of the second and third users user2 and user3 is also demodulated by respective demodulators 16, to produce soft symbols. The soft symbols are then stored in a queue 18, and scheduled by a control function (not shown) to time-share a decoder 20.
Given the architecture shown in FIG. 1, generally, designing a decoder 20 to handle a wide range of data rates is beneficial. Such decoders 20 are designed to handle the worst-case scenario, which is a code with the largest block size, the highest constraint length, and the lowest rate. However, such a conventional decoder 20 is inefficient in handling smaller size code blocks.
To better understand this efficiency issue, well-known turbo and convolutional encoding and decoding will be described. FIG. 2 illustrates a prior art 3GPP rate—⅓ turbo code encoder. The turbo encoder of FIG. 2 consists of first and second eight state rate—½ systematic convolutional encoders 30-1, 30-2. A data stream Xk is input to the first constituent encoder 30-1 to produce a parity one bit Zk for each input bit. For example, a code block size of the data stream may be between 40 bits to 5114 bits long. The data stream Xk is also input to an internal interleaver 40 to produce interleaved data stream X′k. The interleaved data stream X′k is input to the second constituent encoder 30-2. The second constituent encoder 30-2 produces a parity two bit Z′k.
Each of the first and second constituent encoders 30-1, 30-2 has the same structure. Accordingly, for the sake of brevity, the structure of only one of the constituent encoders 30-1, 30-2 will be described. As shown, each of the constituent encoders 30-1, 30-2 includes a switch 50, first-third registers 60-1 to 60-3 connected in series, and first-fourth adders 62-1 to 62-4. The first adder 62-1 adds the output of the second and third registers 60-2 and 60-3. The second adder 62-2 adds the output of the first adder 62-1 with the data stream selectively input by the switch 50. The switch 50 selectively inputs the data stream based on the output of the first adder 62-1. The third adder 62-3 adds the output of the second adder 62-2 and the output of the first register 60-1. The fourth adder 62-4 adds the outputs of the third adder 62-3 and the third register 60-3 to generate the parity bit Zk. The output of the switch 50 supplies the data stream Xk. In the second constituent encoder 30-2, the second parity bits Z′k and the interleaved data stream X′k are supplied in the same manner.
The transfer function of the 8-state constituent code for parallel concatenated convolutional code (PCCC) is:
            G      ⁡              (        D        )              =          [              1        ,                                            g              1                        ⁡                          (              D              )                                                          g              0                        ⁡                          (              D              )                                          ]        ,whereg0(D)=1+D2+D3,g1(D)=1+D+D3.
The initial value of shift registers 60-1 to 60-3 of the first and second constituent encoders 30-1, 30-2 should be zeros when encoding the data streams Xk, X′k. Output from the encoders is:                X1, Z1, Z′1, X2, Z2, Z′2, . . . , Xk, Zk, Z′k,where X1, X2, . . . , Xk are the systematic bits sk for decoding, k is a number of bits in the input data stream block, and Z1, Z2, . . . , Zk and Z′1, Z′2, . . . , Z′k are the parity one bits pk and parity two bits and p′k, respectively, used in the decoding process. As described above, the data streams output from the internal interleaver 40 are denoted by X′1, X′2, . . . , X′k, and these data streams are to be input to the second encoder 30-2. These bits may not be transmitted for decoding.        
Each constituent code termination is performed at the end of the input data block by taking tail symbols from registers 60-1 to 60-3 feedback after all data stream bits are encoded. Tall symbols are padded after the encoding of the data stream bits. After code termination, the constituent encoder is in all-zero state.
The first three tail symbols are used to terminate the first constituent encoder 30-1 while the second constituent encoder 30-2 is disabled. For example, here the switch 50 in the first constituent encoder 30-1 is in a low position. The last three tail symbols are used to terminate the second constituent encoder 30-2 while the first constituent encoder 30-1 is disabled, for example, the switch 50 in the first constituent encoder 30-1 is in a high position.
The transmitted bits for constituent code termination are:
Xk+1, Zk+1, Xk+2, Zk+2, Xk+3, Zk+3, X′k+1, Z′k+1, X′k+2, Z′k+2, X′k+3, Z′k+3 
Well-known decoding of a constituent code may be an iterative method using a structure as illustrated in FIG. 3. The decoding works along each individual constituent code and decodes them individually. As shown, inputs to a first constituent decoder 70-1 may include the systematic bits sk, and the parity bits pk from the first constituent encoder 30-1 and deinterleaved extrinsic information Extrinsic 2,k from a deinterleaver 95 (discussed below). The first constituent decoder 70-1 generates a first log-likelihood ratio LLR1,k based on the respective systematic bits sk, parity one bits pk, and second extrinsic information Extrinsic 2,k. As is known, the first constituent decoder 70-1 also generates extrinsic information Extrinsic 1,k. The inputs to the second decoder 70-2 may include the interleaved systematic bit s′k, the parity bit p′k from the second encoder 30-2 and the interleaved first extrinsic information Extrinsic ′1,k. A first interleaver 90-1 interleaves the systematic bits sk to generate interleaved systematic bits s′k, and a second interleaver 90-2 interleaves the first extrinsic information Extrinsic 1,k to generate interleaved Extrinsic information Extrinsic′ 1,k. The second constituent decoder 70-2 generates a second log-likelihood ratio LLR2,k based on the respective interleaved systematic bits sk, the parity two bits p′k, and first interleaved extrinsic information Extrinsic′ 1,k. The second constituent decoder 70-2 also generates second interleaved extrinsic information Extrinsic′ 2,k. A deinterleaver 95 deinterleaves this information to generate the second extrinsic information Extrinsic 2,k.
The iterative process may be stopped at the end of the second constituent decoder 70-2 after a number of iterations, where the incremental decoding gain may be insignificant. Here, one iteration means the execution of the first constitute decoder 70-1 followed by the second constitute decoder 70-2. The average number of iterations is generally between 6 and 12. The final decision on each systematic bit is made on the sign of the LLR2,k output. In the decoding process, the LLRs of tail systematic symbols are not calculated in either constituent decoder, nor are decisions made on these tail systematic symbols. This is simply because the tail symbols from one constituent code are not recognizable by the other constituent code, therefore the other constituent decoder cannot take advantage of the LLRs of the tail systematic symbols from this constituent decoder.
As explained above, a decoder has to be designed for the worst-case scenario. In other words, a decoder has to be designed to handle (1) a code block size of, for example, 5114 bits with (2) a fixed amount of processing time. To handle the worst case code block size, the decoder is typically designed to reserve enough storage units in the interleaver/deinterleaver, for example, to handle a code block size of 5114 bits. To deal with the fixed amount of processing time, a code block may be divided into overlapping smaller blocks and processed simultaneously.
FIG. 4 illustrates breaking a code block into overlapping sub-blocks for parallel processing to meet this processing time requirement. For example, a code block of length 8K is divided into 6 overlapping sub-blocks, each of length 3K. The 6 sub-blocks can be processed in parallel, and the results, which correspond to the darker blocks in the sub-blocks of FIG. 4 are re-assembled after a metric calculation to give back the 8K block metrics. Since each parallel processed sub-block has a processing size of 3K instead of 8K, the processing time is shortened roughly by a factor of 8/3=2.67. The more sub-blocks the data can be broken into (down to some limit imposed by the code constraint length to guarantee performance), the faster the code block can be decoded. For 3GPP turbo code, if each sub-block length is as low as 128, there may be more than 100 sub-blocks. On the other hand, the more sub-blocks that need to be processed simultaneously, the more complex the hardware (real-estate, clock frequency, control logic, etc). Therefore, a balance is struck between decoding time and hardware complexity. A realistic number of sub-blocks is generally under 16.
Accordingly, the parallel processing can only be utilized marginally if a code block size is in a range of a sub-block size, leaving the rest of the parallel processing engines idle.