Convolutional codes provide forward error correction for second and third generation wireless communications systems. Viterbi decoders are commonly used to decode the convolutionally coded information. The Viterbi decoding consists of two main stages: the state metric function; and the traceback function. State metric units based on a cascade architecture provide flexible computation when multiple constraint lengths and frame sizes are processed. Unfortunately this flexibility causes other difficulties when the cascade block contains a number of ACS units not an integer modulus of the cascade architecture.
Convolutional coding is a bit-level encoding technique rather than block-level techniques such as Reed-Solomon coding. One of the chief advantages of convolutional codes over block-level codes is that convolutional codes may be decoded after an arbitrary length of data, while block-level codes introduce latency by requiring reception of an entire data block before decoding. Thus convolutional codes do not require block synchronization.
Convolutional codes are decoded by using the familiar trellis diagram to find the most likely sequence of codes. The Viterbi algorithm (VA) simplifies the decoding task by limiting the number of sequences examined. The most likely path to each state is retained for each new symbol.
Most digital signal processors (DSP) used in Viterbi decoding incorporate a special hardware unit to accelerate Viterbi metric-update computation called an add-compare-select-store unit. Such an add-compare-select-store unit with dual accumulators and a splittable ALU performs a Viterbi butterfly computation in four cycles.
Convolutional encoder error-correction capabilities use the fact that current code symbol outputs depend on past information bit values. Each coded bit is generated by convolving the input bit with previous uncoded bits. FIG. 1 illustrates an example of this process. The information bits 100 are input to a shift register with taps at various points 101, 102, 103 and 104. The tap values are combined through Boolean XORs 105 and 106. XORs 105 and 106 generate a high output if one and only one input is high. The output of XOR 105 produces code symbol output 107 and the output of XOR 106 produces code symbol output 108.
Error correction is dependent on a number of past samples forming the code symbols. The number of input bits used in the encoding process is the constraint length k. This constraint length is calculated as the number of unit delays plus one in the code generation circuit, such as FIG. 1.
FIG. 1 includes four delays. The constraint length k is thus five. The constraint length represents the total span of values used and is determined independent of the number of taps used to form the code words. The constraint length implies many system properties. Most importantly, the constraint length indicates the number of possible delay states.
Another major factor influencing error correction is the coding rate, the ratio of input data bits to bits transmitted. In the circuit of FIG. 1, two bits are transmitted for each input bit for a coding rate of 1/2. In a circuit having a coding rate of 1/3 includes one more XOR producing one more output for every input bit. Although any coding rate is possible, rate 1/n systems are most widely used due to the efficiency of the decoding process.
Convolutionally encoded data is decoded through knowledge of the possible state transitions, created from the dependence of the current symbol on past information bit data. The familiar trellis diagram having an appropriate number of delay states represents the allowable state transitions for a set of coding parameters.
FIG. 2 illustrates a simple example trellis diagram for a constraint length k=3 and a 1/2-rate encoder. The delay states represent the state of the encoder (the actual bits in the encoder shift register at nodes 101 through 104), while the path states represent the symbols that are output from the encoder (one pair of symbols from the pair of outputs 107 and 108). Each column of delay states indicates (distance between 201 and 202 for example) one symbol interval.
The number of delay states is determined by the constraint length. In this example, the constraint length is three and the number of possible states is 2k−1=22=4. Knowledge of the delay states is very useful in data decoding, but the path states are the actual encoded and transmitted values. In the example of FIG. 2, the delay states are labeled 201, 202, 203 and 204.
The number of bits representing the path states (210 and 211) is a function of the coding rate. In this example, two output bits are generated for every input bit, resulting in 2-bit path states. A rate 1/3 (or 2/3) encoder has 3-bit path states, a rate 1/4 has 4-bit path states, and so forth. Since path states represent the actual transmitted values, they correspond to points on a constellation diagram that describes the specific magnitude and phase values used by the modulator.
The decoding process estimates the delay state sequence, based on received data symbols, to reconstruct a path through the trellis. The delay states 201 through 204 directly represent encoded data, since the states correspond to bits in the encoder shift register. Path states 210 and 211 represent the path bits intermediate to the delay states.
In the circuit of FIG. 2, the most significant bit (MSB) of the delay states corresponds to the most recent input and the least significant bit (LSB) corresponds to the previous input. Each input shifts the path state value one bit to the right, with the new bit shifting into the MSB position. For example, if the current path state is 00 and a 1 is input, the next path state is 10; a 0 input produces a next path state of 00.
Systems of all constraint lengths use similar state mapping. The correspondence between data values and states allows straightforward data reconstruction once the path through the trellis is determined.
FIG. 3 is a high level block diagram illustrating convolutional encoder 301, transmission path 302, and Viterbi decoder 303. Convolutional encoder 301 (such as the example illustrated in FIG. 1) produces a stream p(x) 304 of f by R symbol elements transmitted through transmission path 302, where f is the frame length under consideration and R is the number of bits per symbol. Transmission path 302 introduces errors e(x) 311 with the resulting stream r(x) 305 having f by R corrupted symbol elements. Viterbi decoder 303 receives this input stream and passes the symbols to the branch metrics unit 308 for comparison with known branch metrics stored in decoder RAM 315. The branch metrics unit output 306 is a stream of metrics to be processed by the state metric update 309 to identify the most likely path through the trellis for stream 305. Traceback unit 310 completes processing by identifying the total path through the trellis and producing output 312. This output is the decoder output i(x) for the frame f.
Viterbi Algorithm (VA) minimizes the number of data-symbol sequences represented by trellis paths. As a maximum-likelihood decoder, the VA identifies the code sequence with the highest probability of matching the transmitted sequence based on the received sequence.
The VA code is implemented by three stage decoder unit 303. Decoder unit 303 is driven by the decoder control unit 314 and stores data in decoder RAM 315. The datapath of decoder unit 303 includes branch metrics unit 308, state metric update unit 309 and traceback unit 310. In state metric update unit 309, probabilities are accumulated for all states based on the current input symbol. The traceback routine reconstructs the data once a unique path through the trellis is identified.
FIG. 4 illustrates a brief psuedo-code sequence of the major steps for the VA in flow chart form. For each Frame:
{401: Initialize metrics for each symbol:  {400: Metric Update or Add-Compare-Select (ACS)For each delay state:    {402: Calculate local distance of input to each possible path403: Accumulate total distance for each path404: Select and save minimum distance405: Save indication of path taken406: complete metric update    }  }410: Traceback411: Initialize Tracebackfor each bit in a frame (or for minimum # bits)  {412: Calculate position in transition data of the currentstate413: Read selected bit corresponding to state414: Update state value with new bit  }415: reverse output bit ordering416: complete traceback.}
Although one delay state is entered for each symbol transmitted, the VA calculates the most likely previous delay state for all possible states, since the actual encoder state is not known until a number of symbols are received. Each delay state is linked to the previous delay states by a subset of all possible paths. For rate 1/n encoders, there are only two paths from each delay state. This considerably limits the calculations.
FIG. 4 illustrates beginning by initializing the Metric Update metric paths for each symbol. These path states are then estimated by combining the current input value r(x) 305 and the accumulated metrics of previous states stored in decoder RAM 315. Each path has an associated symbol or constellation point. The local distance to that symbol from the current input is calculated in block 402. For a better estimation of data validity, the local distance is added to the accumulated distances of the state to which the path points in block 403.
Because each delay state has two or more possible input paths, the accumulated distance is calculated for each input path. The path with the minimum accumulated distance is selected as the survivor path and saved in block 404. This selection of the most probable sequence is key to VA efficiency. By discarding most paths, the number of possible paths stored is minimized.
An indication of the path and the previous delay state is stored in block 405 to enable reconstruction of the state sequence from a later point. The minimum accumulated distance is stored for use in the next symbol period. This completes the metric update of block 406 that is repeated for each state. The metric update is also called the add-compare-select (ACS) operation: accumulation of distance data; comparison of input paths; and selection of the maximum likelihood path.
In the metric update, data is stored for each symbol interval indicating the path to the previous state. A value of 1 in any bit position indicates that the previous state is the lower path, and a 0 indicates the previous state is the upper path. Each prior state is constructed by shifting the transition value into the LSB of the state. This is repeated for each symbol interval until the entire sequence of states is reconstructed. Since these delay states directly represent the actual outputs, it is a simple matter to reconstruct the original data from the sequence of states. In most cases, the output bits must be reverse ordered, since the traceback works from the end to the beginning.
FIG. 5 illustrates a prior art state metric unit designed using cascade architecture. The cascade unit is designed to support trellis sizes from 16 to 256 states or a constraint length k from 5 to 9. This unit performs four add-compare-select (ACS) operations 501, 503, 505 and 507, and three transpose operations (Tn×m) 502, 504 and 506. Each block receives two state metric inputs, for example input 508 and 509 to transpose block 502, and generates two state metrics, for example outputs 510 and 511 from transpose block 502. Each ACS unit calculates the state metrics for one trellis delay stage. Therefore, the four ACS units for FIG. 5 calculate the state metrics for four consecutive trellis delay stages.
This architecture supports radix 16 trellises. For trellis sizes 16 and 256 the architecture can be fully pipelined. For other trellis size, the units are not 100% utilized and holes are introduced in the pipeline. The holes are introduced by turning various blocks OFF. The activation of each of the units is illustrated in Table 1. The ON label indicates the functional block is performing as desired. The OFF label indicates the functional block is only passing data. The pipelining remains constant and is not affected by the blocks activation level.
TABLE 1Number ofstatesPass numberACS1T1x4ACS2T1x2ACS3T1x1ACS42561ONONONONONONON2562ONONONONONONON1281ONONONONONONON1282OFFOFFONONONONON641ONONONONONONON642OFFOFFOFFOFFONONON321ONONONONONONON322OFFOFFOFFOFFOFFOFFON161ONONONONONONON
FIG. 6 illustrates pictorially the combinations of butterfly calculations performed by the ACS units. The equations for the ACS unit butterfly for computation of state metrics are:SI=max(SA+BM,SB−BM)  (1)SJ=max(SA−BM,SB+BM)  (2)where: SI and SJ are respective output metrics; SA and SB are respective input metrics: and BM is the metric specific to a particular butterfly.
The ACS will also generate two decision bits for both equations:DI=0 when (SA+BM)>(SB−BM)  (3)                Otherwise DI=1, andDJ=0 when (SA−BM)>(SB+BM)  (4)        Otherwise DJ=1.        
FIG. 7 illustrates a block diagram of a transpose 1 by 4 unit 7 for the state metric unit. Blocks 701, 702, 703 and 704 are delay elements. Delay elements 703 and 704 are required for timing. Two states SI and SJ enter this block and two states SK and SL exit during every clock cycle. The block performs a 1 by 4 transpose of the states. The crossbar block 706 controls the flow of the states. If control input 786 is low, then the states are allowed to pass directly to the other side. Conversely if control input 786 is high, then the states cross over from the bottom rail to the top rail. Crossbar block 706 has a three stage pipeline. States 0 and 1 enter the block during the first cycle; states 8 and 9 enter the block during the second cycle. States 0 and 8 are output after two cycles; states 1 and 9 are output after the third cycle. FIG. 8 illustrates examples of the transpose operations performed by crossbar block 706 using matrix equations.
The output of the cascade block of FIG. 5 is a vector of state metrics that are output two states at a time. These two states are t1 and b1. Table 2 shows the order of the states at the input of ACS1 block 501 and at the outputs of the other blocks of FIG. 5 for a constraint length of 5. For each entry in Table 2, t1 is the first listed integer and b1 is the second listed integer. There are 16 states for k=5 and the states are broken down into an 8 by 2 matrix. The first column illustrates the state metric indices for the inputs to ACS1 501 and is labeled with an I. The other columns illustrate the state metric indices for the outputs of all the other units and are labeled with an O. Similar Tables can be generated for constraint lengths of 6 through 9.
TABLE 2ACS1_IACS1_OT1x4_OACS2_OT1x2_OACS3_OT1x1_OACS4_O5015015025035045055065070, 8 0, 10, 8 0, 10, 8 0, 10, 8 0, 11, 9 2, 32, 104, 54, 128, 91, 9 2, 32, 104, 54, 128, 91, 9 2, 32, 104, 53, 116, 76, 1412, 135, 1310, 113, 116, 74, 128, 91, 9 2, 32, 104, 54, 128, 95, 1310, 113, 116, 76, 1412, 135, 1310, 116, 1412, 135, 1310, 113, 116, 76, 1412, 137, 1514, 157, 1514, 157, 1514, 157, 1514, 15
The actual decoding of symbols into the original data is accomplished by tracing the maximum likelihood path backwards through the trellis. Generally, a longer sequence results in a more accurate reconstruction of the trellis. After a number of symbols equal to about four or five times the constraint length, little accuracy is gained by additional inputs.
The traceback function starts from a final state that is either known or estimated to be correct. After four or five iterations of traceback, the constraint length, the state with the minimum accumulated distance can be used to initiate final traceback. A more exact method is to wait until an entire frame of data is received before beginning traceback. In this case, tail bits are added to force the trellis to the zero state, providing a known point to begin traceback.