The present invention relates generally to information coding and decoding techniques, and more particularly to decoder algorithms and architectures for use in decoding encoded information signals.
Coding is widely used to decrease bit and packet error probabilities in the transmission of digital information. In many applications, convolutional codes are used for this purpose. A convolutional code is defined by a state machine with state transitions determined by the input bits to be encoded. The two most common decoding algorithms for convolutional codes or codes based on convolutional codes are the Viterbi algorithm and the maximum a-posteriori probability (MAP) algorithm.
The Viterbi algorithm is described in A. J. Viterbi, xe2x80x9cError bounds for convolutional codes and an asymptotically optimum decoding algorithm,xe2x80x9d IEEE Trans. Inform. Theory, Vol. IT-13, pp. 260-269, April 1967, which is incorporated by reference herein. The algorithm decodes received bits or symbols by finding the most likely path through a time-expanded state transition diagram called a trellis.
FIG. 1 shows a conventional block-serial Viterbi decoder 100. The decoder 100 includes S branch metric units (BMUs) 102, individually denoted 102-j, where j=0, 1, . . . Sxe2x88x921. A given one of the BMUs 102-j has an input which receives a bit or symbol, and an output which is coupled to an input of a corresponding add-compare-select unit (ACSU) 104-j. The BMU 102-j and ACSU 104-j are also denoted BMUj and ACSUj, respectively, in the figure. The outputs of the set of ACSUs 104 are applied to a survivor memory unit (SMU) 106 which generates decoded bits from the received bits or symbols. The operation of the Viterbi decoder 100 is described below. This description assumes that a given block being decoded comprises N received bits or symbols.
The decoder 100 is initialized by initializing the path metric of state 0 to be zero, PM0=0, and that of all other states to infinity, PM1 . . . Sxe2x88x921=∞. The decoding algorithm then performs the following recursive iteration which includes an outer loop which iterates on every received bit or symbol and an inner loop which iterates on every state in the trellis:
For every received bit or symbol, Rxi, i=0, 1, . . . Nxe2x88x921, in the block:
For each state in the trellis:
1. Calculate a branch metric for each branch from the current state to a possible next state. The branch metric for a given branch of the trellis is a measure of the likelihood of the transition from the current state to the state the branch connects to given the received symbol or bit Rxi. The branch metric calculation is performed by the BMUs 102.
2. Perform a comparison to find the minimum path metric entering each state, the path metric being formed as the sum of a previous state path metric and the associated branch metric for the transition. This minimum now becomes the state path metric for the next iteration. The comparison is performed in the ACSUs 104.
3. Store the decision of which branch won the comparison for each state into the SMU 106.
As noted above, this iterative process is performed once for every bit or symbol in the received block of N bits or symbols to be decoded in order to complete one decoding iteration which produces a single updated estimate of the received sequence for each bit or symbol. To maximize throughput, a state parallel architecture can be utilized in which the inner loop operations are performed in parallel for the multiple states, as illustrated in FIG. 1. However, the outer loop operations cannot be done in parallel because updated path metrics for each state from one iteration are required as input for the next iteration.
The decoding algorithm is finalized by identifying the state having the minimum final path metric after all the bits or symbols of the received block have been processed. This state is referred to as the winning state. From the winning state, the decoding decision that is stored there is traced back to the state that preceded it. This trace-back process is continued until the state corresponding to the start of the block is reached. The path defined by these decisions identifies the most likely transmitted sequence of bits or symbols, and is processed to yield the decoder output.
Unlike the Viterbi algorithm, the above-noted MAP algorithm uses a forward and reverse pass across the trellis to find the maximum a-posteriori probability of each bit or symbol in the transmitted sequence. The MAP algorithm is described in greater detail in L. R. Bahl, J. Cocke, F. Jelinek and J. Raviv, xe2x80x9cOptimal decoding of linear codes for minimizing symbol error rate,xe2x80x9d IEEE Trans. Inform. Theory, Vol. IT-20, pp. 284-287, March 1974, which is incorporated by reference herein.
The Viterbi and MAP algorithms are both based on sequential processing, with operations being performed for each bit or symbol in a received block depending on the result of the ACS calculations for a previous bit or symbol. This dependency prevents pipelining of the ACS operation above the bit or symbol rate and hence limits the speed at which a convolutional code can be decoded, thereby limiting the throughput of the decoder.
For applications requiring greater error correcting capability, serial or parallel concatenation of two or more convolutional codes are often used. An example of a concatenated code is the so-called Turbo code described in, e.g., C. Berrou, A. Glavieux, and P. Thitimajshima, xe2x80x9cNear Shannon limit error-correcting coding: Turbo codes,xe2x80x9d Proc. IEEE Int. Conf. Comm., Geneva Switzerland, 1993, pp. 1064-1070, which is incorporated by reference herein. The decoding of concatenated codes requires the result of decoding one code trellis as input for decoding the next code trellis, and so on for subsequent code trellises. Such iterative serial processing requires that either a Viterbi or MAP decoder trace sequentially through each trellis multiple times, with subsequent iterations waiting on the result of prior decoding results. When implemented in hardware, the sequential nature of the decoding algorithms necessitates a bit or symbol serial architecture. This multiplies the latency of the Viterbi or MAP decoder by the number of iterations to be performed and the number of constituent codes, thereby resulting in a substantial decoding latency for concatenated codes.
As previously noted, the recursive dependency of the Viterbi and MAP algorithms also makes it impossible to pipeline the calculations in order to improve the decoder throughput. Although it is possible to run the decoder at a higher clock speed in order to improve throughput and minimize latency, such an approach increases the power dissipation of the decoder. The power and latency problems associated with decoding concatenated codes often limit their use for practical applications. Furthermore, even when such codes are used, the number of iterations may be restricted, thus sacrificing coding gain to meet latency requirements. If it were possible to perform each decoding iteration for such codes with lower latency the decoder performance could be improved by increasing the number of iterations performed.
It is also known in the art that it is possible to formulate a block-serial decoding process of a specific class of codes, i.e., compound codes, using probability dependency graphs. A compound code refers generally to a code resulting from a combination of multiple codes. See, e.g., F. R. Kschischang and B. J. Frey, xe2x80x9cIterative decoding of compound codes by probability propagation in graphical models,xe2x80x9d IEEE Journal on Selected Areas in Comm., Vol. 16, No. 2, pp. 219-230, February 1998, which is incorporated by reference herein. However, such techniques have not heretofore been applied to block-parallel decoding of general codes, including convolutional codes, Turbo codes or other concatenated codes, etc., where a general code includes any code for which a parity check matrix can be determined.
As is apparent from the foregoing, a need exists for improved decoding algorithms and corresponding decoder architectures which provide improved throughput and reduced latency, without increasing power dissipation.
The invention provides block-parallel decoding algorithms and corresponding decoder architectures for performing iterative decoding of a wide variety of different codes, including, for example, convolutional codes, Turbo codes or other concatenated codes which are based on convolutional codes, block codes, etc.
Advantageously, the invention reformulates the decoding problem for general codes into a block-parallel form and eliminates the serial dependencies inherent in the conventional block-serial decoding algorithms and decoder architectures described above, thereby substantially reducing latency and increasing throughput.
In accordance with the invention, a block-parallel decoding algorithm and corresponding decoder architecture utilizes a set of interconnected processing nodes configured in the form of a probability dependency graph. The probability dependency graph is characterized at least in part by the code used to encode the blocks of bits or symbols, and the processing nodes implement a block-parallel decoding process for the blocks of bits or symbols to be decoded.
In an illustrative embodiment of the invention, the probability dependency graph is in the form of a bipartite probability dependency graph which includes a set of N variable nodes and a set of T check nodes, with one of the N variable nodes being associated with each of N bits or symbols of a given block to be decoded. A single iteration of the block-parallel decoding process produces within the variable nodes an updated estimate for every bit or symbol in the given block. The block-parallel decoding process in this case is initiated by sending to each check node connected to a variable node a function f(Rxi) where Rxi is the received bit or symbol, i=0, 1, . . . Nxe2x88x921, and f(.) is a function determined at least in part by the code, such as a threshold function, a linear scaling function, or an approximate exponential function.
The block-parallel decoding process includes calculating in parallel at each of the check nodes a function g(.) of the input from the variable nodes connected to that check node, and sending to the connected variable nodes the result of these check node calculations. The function g(.) may include, e.g., a parity check function, a hyperbolic tangent function in the log domain, or a maximum function in the log domain. For example, the function g(.) may include a parity check function and additionally a function to update an indication of the reliability with which the current inputs satisfy the parity check constraints. The additional reliability update function may be a hyperbolic tangent function in the log domain.
The block-parallel decoding process further includes calculating in parallel at each of the variable nodes a function h(.) of the input from the check nodes connected to that variable node, and sending to the connected check nodes the result of these variable node calculations. The function h(.) may be, e.g., a majority function or an averaging function.
The block-parallel decoding process is terminated when all check nodes satisfy a constraint of the code or after a maximum number of iterations. Upon termination, the values at the variable nodes determine the estimated, i.e., decoded, transmitted bit or symbol and possibly an associated a-posteriori probability or reliability.
In another illustrative embodiment of the invention, the probability dependency graph is in the form of a directional probability dependency graph with multiple levels of nodes including an initial level, at least one intermediate level and a final level, arranged such that calculations can be performed at each level without sequential intra-block dependencies. The initial level of nodes includes an input node for each bit or symbol in a given block to be decoded. The nodes in each intermediate level receive inputs only from nodes of a previous level and deliver outputs only to nodes of a subsequent level. The outputs of the nodes of the final level give an estimate of the transmitted bits or symbols for the given block. The block-parallel decoding process is initialized in this embodiment by providing as input to each node in the initial level a function f(Rxi) where Rxi is the received bit or symbol, i=0, 1, . . . Nxe2x88x921, and f(.) is a function determined at least in part by the code, such as a threshold function, a linear scaling function, or an approximate exponential function. Each intermediate level of nodes then calculates a function g(.) of incoming values from the previous level of nodes, and passes the result of the calculation to the next level of nodes, such that each of the levels produces an updated estimate for every bit or symbol in the given block.
In accordance with another aspect of the invention, the set of interconnected processing nodes may be implemented using one or more programmable computation units. For example, a different computation unit may be used to implement each node, or to implement a given set of multiple nodes. Such computation units may comprise reconfigurable hardware or other devices suitable for implementing different block-parallel decoding processes for different codes.