Binary transmission of data through a noisy channel has given rise to various approaches to minimize errors that can result from such transmission. For example, various forward error correction (FEC) techniques have been developed for encoding the signals prior to transmitting them through the channel, with compatible decoding techniques at the receiving end for reducing the overall error rate. One FEC technique to improve the capacity of a channel includes adding some carefully designed redundant information to the data being transmitted through the channel. The process of adding this redundant information is known as channel coding. Convolution and block coding are currently two major forms of channel coding. Convolution coding typically operates on serial data, one or a few bits at a time. Block codes operate on relatively large (typically, up to a couple of hundred bytes) message blocks. There are a variety of useful convolution and block codes, and a variety of algorithms for decoding the received coded information sequences to recover the original data. Convolution coding with compatible decoding is a FEC technique that is particularly suited to a channel in which the transmitted signal is corrupted mainly by additive white Gaussian noise (AWGN), such as wireless transmissions, for example, cell phones and radios.
Convolution codes are usually described using two parameters: the code rate and the constraint rate. The code rate, k/n, is expressed as a ratio of the number of bits into the convolutional encoder (k) to the number of channel symbols output by the convolutional encoder (n) in a given encoder cycle. The constraint length parameter, K, denotes the “length” of the convolutional encoder, i.e., how many k-bit stages are available to feed the combinatorial logic that produces the output symbols. Closely related to K is the parameter m, which indicates how many encoder cycles an input bit is retained and used for encoding after it first appears at the input to the convolutional encoder. The m parameter can be thought of as the memory length of the encoder.
Viterbi decoding is one of two types of decoding algorithms used with convolutional encoding, the other type being sequential decoding. Sequential decoding has the advantage that it can perform very well with long-constraint-length convolution codes, but it has a variable decoding time. Viterbi decoding has the advantage that it has a fixed decoding time. It is well suited to hardware decoder implementation, but its computational requirements grow exponentially as a function of the constraint length, so it is usually limited in practice to constraint lengths of K=9 or less. Viterbi decoding algorithms are typically used for decoding trellis-coded modulation, the technique of squeezing high ratios of bits-per-second through bandwidth limited channels. In general, Viterbi algorithms were originally conceived as an error-correction scheme for noisy digital communication links. However, it is now also used in information theory, speech recognition, keyword spotting, computational linguistics bioinformatics, as well as other applications.
Viterbi decoding determines the path with the minimum path metric through the trellis, with the path metric being defined as the sum of the branch metrics along the path. This is done in a step wise manner by processing a set of state metrics forward in time, stage by stage over the trellis.
The complexity of Viterbi algorithms lies in the computation of 2k-1 path metrics for a constraint K decoder at each time stage. A processor that implements a Viterbi decoder typically includes three major blocks: the branch metrics calculation unit (BMU), the add-compare-select unit (ACS), and the survivor path decoding unit. The branch metrics unit typically performs the calculation of distances of sampled signals from targets, which are Euclidean in the case of AWGN. New branch metrics are computed for each incoming sample, at every clock cycle.
Similarly, a new value of the state metrics has to be computed at each time instant. In other words, the state metrics have to be updated every clock cycle. As a result, common approaches of recursive and pipelining processing are not applicable for increasing the throughput of the system. Hence the ACS unit is the module that consumes the most power and area (when implemented on a chip).
The survivor management unit (SMU), or trace back block or mechanism, is responsible for tracing back through the trellis using the survivor bits to reproduce the original input bits. In tracing back, the shortest path through the trellis must be traced. The selected minimum metric path from the ACS output points the path from each state to its predecessor. In theory, decoding of the shortest path would require the processing of the entire input sequence. However, in practice, the survivor paths merge after some number of iterations. From the point they merge together, the decoding is unique. The trellis depth at which all the survivor paths merge with high probability is referred to as the survivor path length.
The Viterbi algorithm is therefore effective in achieving noise tolerance, but the cost is an exponential growth in memory, computational resources, and power consumption. Various approaches have been suggested to address this issue, including an adaptive Viterbi algorithm (e.g., Tessier, R. et al., “A Reconfigurable, Power-Efficient Adaptive Viterbi Decoder”, publication date unknown), and a dynamically reconfigurable adaptive Viterbi decoder (e.g., S. Swaminathan et al., “A Dynamically Reconfigurable Adaptive Viterbi Decoder”; and Chadha, K. et al., “A Reconfigurable Viterbi Decoder Architecture” IEEE Publication Number 0-7803-7147-X/01, pp. 66-71 (2001), and FPGA '02, Feb. 24-26, 2002, Monterey Calif., ACM 1-58113-452-5/02/0002, pp 227-236 (2002), Liang et al., “A Dynamically-Reconfigurable, Power-Efficient Turbo Decoder”, Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'04) (exact date of publication unknown), and Chadha, K, “A Reconfigurable Decoder Architecture for Wireless LAN and Cellular Systems”, Master Thesis, Rice University, April, 2001); Zhu et al., “Reconfigurable Viterbi Decoding Using a New ACS Pipelining Technique, Proceeding of the Application-Specific Systems Architectures and Processors (ASAP'03) (exact date of publication unknown), Yeh et al., “RACER: A Reconfigurable Constraint-Length 14 Viterbi Decoder”, 0-8186-7548-9/96, pp. 60-69 (1996), and Zhan et al, “Domain Specific Reconfigurable Fabric Targeting Viterbi Algorithm” ICFPT 2004 0-7803-8652-3/04 IEEE (2004), pp. 363-366.
Shift register convolution decoders of the Viterbi decoding algorithm for both recursive and non-recursive systematic codes is a critical computational block, e.g., in modems and other communication devices. They are used, for example, in channel decoders, ML (Maximum Likelihood) equalizers, ML decoders of space-time codes for MIMO RF antennas, and ML filtering to name a few. An interesting feature of shift register convolution decoders of Viterbi decoding algorithms for recursive systematic and non-recursive systematic codes is that they can be operated in a parallel, serial or hybrid (serial-parallel) form by using an adjustable reconfigurable network of ACS blocks, BMU generators with adjustable reconfigurable connections to the ACS network, and adjustable reconfigurable trace-back mechanisms for recursive systematic and non recursive systematic forms. Hence, the same hardware can be employed for both recursive systematic and non-recursive systematic codes with various constraint lengths K and generator polynomials. FIG. 1 illustrates an example of a non-recursive systematic Viterbi decoder with constraint length K=7, while FIG. 2 illustrates an example of a recursive systematic Viterbi coder with constraint length K=4 as used inside WCDMA turbo codes.
In a standard implementation, to provide function-specific reconfigurability it is first necessary to analyze the computational structure. Typically, the Viterbi decoder has a shuffle-exchange interconnect structure of ACS blocks, which varies with the size of the constraint length K, the type of recursive systematic codes and/or non-recursive systematic codes and, for the latter a chosen feedback generator polynomial. Furthermore, the connections of the outputs bearing the BMU values within the ACS network depend on the code generator polynomial (the number of distinct polynomials. i.e., the code rate; for example, a rate ½ has two generator polynomials, while a rate ¼ code has four generator polynomials, etc.). It is thus difficult to provide flexibility to the most energy-efficient fully-parallel implementations, where the implementations are typically constrained by the code rate for which the implementation is designed.
In a fully parallel implementation the signal flow graph is directly mapped onto hardware. In general, the constraint length K code decoder requires 2k-1 ACS butterfly units. For instance, for a non-recursive systematic code with constraint length K=5 i.e., a 16-state Viterbi decoder, there is a total of 16 ACS butterflies at each stage, and they are interconnected in a manner as shown in FIG. 3 (in the figure the time advances from left to right). This maximum parallel architecture has the potential for high performance and a low power consumption implementation; however, it bears a high cost of large silicon area, especially for large constraint length decoders.
When the implementation is done in fixed-point arithmetic, the scaling and overflow handling are crucial for correct behavior of the transformer. The ACS butterfly operation at each stage of the decoder uses an “addition in 2's complement” so that if the state metrics have enough bits (one bit more than the number of bits needed for representing the maximum difference between metrics for a given constraint length K), the addition of 2's complement properties will fulfill the resealing without additional hardware. Thus, for a reconfigurable decoder design, it is enough to have a bit representation for the metrics corresponding to the maximum difference that is required. Designing a reconfigurable implementation therefore depends on: (1) the biggest constraint length K that the reconfigurable implementation of the shift register convolution decoder is able to handle, and (2) the maximum expected BMU generator rate. Such a design will result in the needed rescaling for all the constraint lengths equal to or less than the maximum expected constraint length K and rate. The referenced prior art to reconfigurable architecture is not completely satisfactory because the referenced decoders are each reconfigurable to or adapted to process only one type of Viterbi algorithm, thus limiting the application of the decoders. Further limitations result because such designs fix the level of parallelism, i.e., do not allow flexibility in the parallel-serial structure, even though the level of parallelism can vary depending on the Viterbi algorithm decoded. Finally, where simpler codes are implemented (e.g., with k=3), current designs simply switch off unused hardware.