The present invention relates generally to digital communication systems, and more particularly to Viterbi decoders and other convolutional decoders for use in such systems.
Channel coding is a conventional technique commonly used to increase the robustness of a digital communication system. The principle underlying channel coding is to introduce redundancy and memory into the transmitted bit stream so as to facilitate error detection and correction at the decoder. Two general classes of channel codes are block codes and trellis codes. Block codes operate on a block-by-block basis, such that output code words depend only on the current input block message. Trellis codes, in contrast, may be viewed as mapping one arbitrarily long bit stream into another, with no assumed block structure. A commonly-used linear class of trellis codes are known as convolutional codes. In such codes, output codewords result from the convolution of an input message stream with the impulse response of an encoder which includes a v-stage shift register. A given n-bit code word is generated as a function of m input bits and v bits stored in the shift register. The constraint length K of the encoder is defined as m+v, and the rate of the code is given by m/n, where n greater than m.
A convolutional encoder operates as a finite state machine with a maximum of N=2v=2Kxe2x88x92m possible states. The m input bits cause a transition from a present state, defined by v bits, to the next state, and the number of output bits, i.e., code bits, produced depends on the rate of the code. The transitions from one state to another when viewed as a function of time result in a graph commonly known as a xe2x80x9ctrellis.xe2x80x9d FIG. 1 shows a trellis diagram for a rate 1/2 convolutional code with a constraint length K=4. This code includes N=2Kxe2x88x92m or 8 possible states, each corresponding to a group of v=3 bits and designated by one of eight dots in each of an xe2x80x9cold statexe2x80x9d and xe2x80x9cnew statexe2x80x9d column. The diagram shows all of the possible transitions between a given one of the old states and the new states that can be reached from the given old state. Since m=1 in this example, the encoding process dictates that there can be only two transitions out of a state and two transitions into a state. In general, for m input bits, there are 2m transitions out of and into a state. For a code with m=2, there would be four such transitions.
It should be noted that the state assignment shown in FIG. 1 is arbitrary to some degree. The convention adopted in this example is that the input bit shifts into the least significant bit (LSB) of the shift register while the most significant bit (MSB) shifts out of the register. According to this convention, two states differing in the MSB converge onto the same state when an input is shifted into the LSB. For example, the 0 and 4 states both converge to the 0 state when a 0 is shifted into the register. More generally, two states differing by N/2 in their state assignment converge to the same state under the same input condition. In addition, if a 0 is shifted into the LSB of the register, the new state will be an even state, and conversely, a 1 shifted into the LSB leads to an odd state. Since an upshifting operation is equivalent to multiplication by 2, the process can be generalized by the following transitions: an input 0 causes state j to go to state 2j, while an input 1 causes state j to go to 2j+1; similarly, an input 0 causes state j+N/2 to go to 2j, while an input 1 causes state j+N/2 to go to 2j+1. These transitions are illustrated in FIG. 2 for a rate 1/2 code, and the resulting computational structure is commonly known as a xe2x80x9cbutterfly.xe2x80x9d
The convolutional encoding process can be viewed as tracing a path through the trellis diagram. FIG. 3 shows one such path traced through an 8-state trellis as a function of time. The vertical axis denotes the state numbers in ascending order, and the horizontal axis represents time. Each stage of the trellis represents a period of time T. Typically, the shift register is initialized to start at the 0 state. For each of the transitions shown in FIG. 3, n code bits are generated. Thus, the objective of the corresponding decoding process is to retrace this path through the trellis based on the received code symbols. FIG. 4 shows all of the possible paths for an 8-stage trellis over a period of 7T. At time T, there are 8 possible paths, at time 2T, there are 16, and so on. Thus, the number of possible paths grows exponentially with time. Note that each path is a particular sequence of transitions from one trellis stage to the next. Hence, a xe2x80x9cpath metricxe2x80x9d for a given path is given by the sum of the individual transition metrics, i.e, xe2x80x9cbranch metrics.xe2x80x9d The decoding process therefore generally involves the steps of: (1) computing branch metrics based on the received code symbols; (2) computing path metrics by summing branch metrics; (3) selecting an optimal path after a certain time; and (4) performing a xe2x80x9ctracebackxe2x80x9d operation along the optimal path to extract the corresponding input bits. In Viterbi decoding, the problem of exponential growth in the number of paths is solved by selecting, at each time step, one of two converging paths. As a result, the number of paths under consideration remains constant with time. This elimination of paths at each time step, i.e., at each trellis stage, is referred to as an add-compare-select (ACS) operation.
FIG. 5 shows the general structure of a conventional Viterbi decoder 10. The decoder 10 includes a branch metric calculator 12, a recursive ACS engine 14, and a traceback unit 16. Soft symbols are applied via an input buffer 18 to the calculator 12. The calculator 12 computes the branch metrics associated with all possible transitions for a given stage of the trellis. Regardless of the number of states in the trellis, the number of unique branch metrics for a rate 1/n convolutional code is given by 2n. That is because for a rate 1/n code, there are only 2n unique code n-tuples. While there are 2m. N branches in the trellis, and with each branch there is associated a particular n-tuple of code bits, there can only be as many unique branch metrics as there are n-tuples. The ACS engine 14 is recursive since the new path metrics depend on the path metrics computed for the previous stage and the branch metrics corresponding to the transitions from the previous stage to the next stage. The output of the ACS engine 14 is supplied to the traceback unit 16, and the resulting output is buffered in output buffer 20. A finite-state-machine controller 22 controls the operation of the various elements of the Viterbi decoder 10.
FIG. 6A illustrates an exemplary add-compare-select operation in greater detail. Two initial stages, j and J, separated by N/2, converge to a state 2j. The accumulated path metric associated with j is given by xcex93j and that associated with J is given by xcex93j. The respective branch metrics xcexj0 and xcexj0, where 0 represents the transition caused by a 0 input, are added to the path metrics xcex93j and xcex93J, respectively, and depending on the branch metric calculation process, either the minimum or maximum metric path is selected. For example, the maximum is chosen when the branch metric is proportional to the inner product between a received symbol and the corresponding code symbol. Conversely, the minimum is chosen when the branch metric is proportional to the Euclidean distance between the received and code symbols. FIG. 6B shows circuitry for implementing this add-compare-select operation, including adders 30, a compare unit 32 and a select unit 34.
FIGS. 7A, 7B and 7C illustrate various conventional architectures for the ACS engine 14 of FIG. 5. FIG. 7A shows a state-serial architecture which includes an ACS unit 40 and a state metric (i.e., path metric) random access memory (RAM) 42. An ACS engine 14 with this architecture sequences through a trellis stage, retrieving old path metrics from the RAM 42, and writing back the new path metrics to the RAM 42. Although such an architecture is extremely area-efficient, it is also very slow, and can generally only be used in very low data rate applications, such as speech processing. FIG. 7B shows a state-parallel architecture which attempts to update all of the path metrics in a given trellis stage simultaneously. This architecture includes an ACS unit 40-i, i=1, 2, . . . N, as well as first and second memory units 44-i and 48-i, for each of the N states of the trellis. A routing network 46 is used to supply the appropriate metrics to the various ACS units 40-i as required. While this architecture provides a high throughput, the routing network can take up a very large amount of area. Such architectures are generally not feasible if the constraint length of the convolutional code is large, since the required area increases exponentially with constraint length. FIG. 7C shows a so-called xe2x80x9cshuffle-exchangexe2x80x9d (SE) architecture which makes use of both spatial and temporal parallelism. The SE architecture of FIG. 7C includes a number of butterfly structures 50 arranged as shown, and each butterfly structure 50 includes a pair of ACS units 40A and 40B. Instead of computing just one trellis stage, the SE architecture can compute a few trellis stages before feeding back the output to the input. However, the SE architecture suffers from the same drawbacks as the state-parallel approach in that it is prohibitive to implement for a code with a large constraint length.
It is therefore apparent that further improvements are needed in Viterbi decoding techniques in order to provide decoders which are more area-efficient and can be implemented with reduced complexity and cost in a wide variety of applications, such as in wireless base station receivers and other applications which utilize codes with large constraint lengths.
The invention provides apparatus and methods for area-efficient implementation of convolutional decoding techniques. An illustrative embodiment for decoding received symbols in a communication system includes a branch metric calculator, an ACS engine and a traceback unit. The branch metric calculator computes branch metrics for transitions in a trellis representative of a convolutional code used to generate the symbols. In accordance with one aspect of the invention, the branch metrics are computed from an offset binary representation of the symbols using an inverse likelihood function, such that a strong match between a given received symbol and a possible codeword of the convolutional code results in a small branch metric, while a weaker match between a given received symbol and a possible codeword results in a larger branch metric. The corresponding path metrics therefore grow at a smaller rate, require less memory, need less word width, and result in infrequent renormalizations. This offset binary technique results in an implementation that is approximately 25% more area-efficient than a corresponding conventional 2""s complement implementation.
The ACS engine processes path metrics generated from the branch metrics so as to determine a selected path through at least a portion of the trellis. In accordance with another aspect of the invention, the ACS engine may utilize a state-serial architecture which computes path metrics for k states of a given stage of the trellis per clock cycle, using branch metrics obtained from k sets of registers in the branch metric calculator. The ACS engine may also include a plurality of distinct memories operating in a xe2x80x9cping-pongxe2x80x9d fashion, such that during a given trellis stage, path metrics are read from a first one of the memories and written to a second one of the memories, and during a subsequent trellis stage path metrics are read from the second one of the memories and written to the first one of the memories. The memory configuration remains unchanged in going, for example, from k=2 to k=4. An embodiment of the ACS engine with k=4 uses four distinct memories and performs two butterfly computations per clock cycle. However, for k greater than 4, additional memory may be required. The invention thus provides an optimal memory configuration and a speedup in ACS computations by approximately a factor of two in an embodiment with k=4. For example, with k=4, it can be shown that maximum throughput is obtained for minimum memory area in an implementation in which the number of states N=256, the constraint length K=9, and the traceback length =64.
The traceback unit generates a sequence of decoded bits from the selected path. In accordance with yet another aspect of the invention, the traceback unit may be configured to include a staging register and a traceback memory. The staging register receives selected path information from the ACS engine. The contents of the staging register for a given stage of the trellis are loaded into the traceback memory when the staging register becomes full, at a location given by a number of the stage modulo a predetermined traceback length. Traceback is initiated when the traceback memory becomes full. During traceback, the traceback unit generates the decoded bits from a given portion of the traceback memory, and the given portion is subsequently filled with additional selected path information from the staging register. The staging register generally writes non-contiguous data to the traceback memory, and a pair of series-connected multiplexers can be used to extract a relevant bit from a given set of bits in the traceback memory. This traceback aspect of the invention can reduce the amount of traceback memory required in the decoder by a factor of 50% or more relative to conventional arrangements.
The invention is particularly well suited for use in applications such as a very large scale integrated (VLSI) implementation of an area-efficient Viterbi decoder for an IS-95 (North American Narrowband CDMA) base station receiver, although it can provide similar advantages in numerous other applications.