A. Field of the Invention
The present invention relates to signal processing devices and methods. More specifically, it is directed to a method and device for performing maximum likelihood estimation. The invention may be used to perform signal decoding or equalization, typically in the context of data transmission systems or data storage systems utilizing communication channels.
B. Description of the Related Art
Error correction codes are widely used in communication systems to improve their performance by adding redundancy to the information to be transmitted. When received at the other end of the communication channel, the redundancy of the encoded information provides a mechanism to correct errors that may have occurred as a result of noise and distortion within the channel.
One class of codes is known as convolutional codes. Convolutional codes are formed by convolving an input symbol stream, typically made up of an integer number of binary bits k, with a generator polynomial that defines the code. The code is typically analyzed in terms of the state transitions of the encoder that occur in response to the input bits. The time sequence of allowable state transitions may then be diagramed in the form of a trellis, as shown in FIG. 1. In FIG. 1, the states are shown on the left side as (00), (01), (10), and (11), and the time index in terms of n is shown along the bottom. Each state transition from a first state to a possible next state is a branch of the trellis, and has an associated output symbol. It is well understood in the art that the sequence of possible state transitions through the trellis is referred to as a “path” through the trellis. The output symbols are then transmitted over a communication path.
At the receiver of a communication system, the encoded output symbols are analyzed to determine the most likely path through the trellis that generated the symbols, and the input bits may then be determined. One common and well-known algorithm to determine the trellis path is the Viterbi algorithm, initially proposed by A. J. Viterbi in “Error bounds for convolution codes and an asymptotically optimal decoding algorithm.” IEEE Transactions on Information Theory, 13:260-269 (1967).
The Viterbi algorithm operates by examining the possible paths to each node at a given time index based on the received symbol, then selecting the most likely path to each node. In the trellis of FIG. 1, each state or node has two paths entering it. The selection is based on a branch metric. That is, for each branch of the trellis, an associated branch metric is determined that provides a measure of how close the received symbol was to the symbol associated with that particular trellis transition. The branch metric is added to the accumulated total metric for the starting node. At each node of the next state, the summed metrics of the paths entering a given state are compared, and the path having the smallest cumulative error is selected. The determination of surviving paths is thus referred to as an “add-compare-select” process, and is well known in the art. The surviving paths are then extended to the next time index, and the most likely path to each node is again selected in the same manner. If the number of errors is within the code's error correcting ability, the surviving paths will eventually merge at the earliest time indices, thereby determining the most likely path.
There are many variations to the Viterbi algorithm, and many ways to implement the algorithms in a VLSI architecture. One variation is described in the article by M. Bóo, F. Argüello, J. D. Bruguera, R. Doallo and E. L. Zapata, entitled “High-Performance VLSI Architecture for the Viterbi Algorithm” IEEE Transactions on Communications. Vol. 45. No. 2. Pages 168-176. (1997).
One technique of implementing the Viterbi algorithm using a parallel structure is to consider blocks of state transitions over a period of time (say N state transitions) and collapse each block into a single transition. That is, each block is analyzed independently, and the best paths from each initial state to each of the possible end states may be determined for each block. Each of the four initial states ((00), (10), (01), (11)) at time n−2 has four possible paths to each possible terminating state at time n+2. For each initial state, the best path to each terminating state is identified, and these paths represent the path metrics for the branches in the new compacted trellis. FIG. 1B provides an example of a collapsed trellis.
Each of the blocks of state transitions can be operated on in parallel, each one serving to reduce N transitions to a single transition in the collapsed trellis. The original trellis is thereby collapsed by a factor of N, providing a significantly reduced trellis. The reduced or collapsed trellis may then be traversed using the standard Viterbi process of determining surviving paths to each node by the add-compare-select process described above. There are numerous algorithms in the prior art that utilize various optimization techniques to provide particularly efficient structures to collapse a trellis code.
One significant disadvantage of the existing parallel Viterbi structures is that the parallel structure still provides a trellis that must be navigated using the add-compare-select process at each clock cycle. Even though the clock rate of the collapsed-trellis decoder may be reduced by using parallel circuits to collapse the trellis, numerous add-compare-select operations may still have to be performed, resulting in a long critical path. To counteract this, a highly parallelized structure is required in order to operate at high data rates. This of course increases the size, complexity and cost of the decoder and imposes an even greater decoding delay, which may be unacceptable. Even in a highly parallel structure, the duration of the critical path may be such that the decoder circuit is incapable of operating at very high signal processing rates.
Decoders as discussed above may be used in numerous applications. One such application is trellis-coded modulation, where a signal constellation is partitioned into subsets of signal points. The redundant bits of the convolutional codes are used to select the subset, and the uncoded bits are used to select the point within the subset. Viterbi decoding may then be used to determine the most likely sequence of signal subsets as part of the demodulation.
Another suitable use of the decoder is for channel equalization. In this application, the channel impulse response of a communication channel is modeled as a convolutional encoder that generates output “symbols” in response to the input bits. By estimating the impulse response of the channel, the expected output symbols of the convolutional coder (i.e., the channel) may be determined. Then, the received symbols, which are noisy estimates of the channel symbols, may be used to decode the symbol sequence using sequence estimation techniques such as the Viterbi algorithm. This is generally referred to as Viterbi equalization.
The decoder described herein may be used in any of the scenarios described above and in any scenario where the Viterbi algorithm or a variation of the algorithm may be employed. One such example is with typical prior art communications systems, such as the one shown in FIG. 2A. It consists of an optical transmitter 10 including an optical modulator 12 that converts the electrical data signal from pulse generator 14 into an optical signal 16 that is propagated through the optical fiber 20. At the receiving end, the optical signal from the fiber 20 is typically amplified by an optical amplifier 22, and the signal is then incident on an optical-to-electrical signal converter such as a photosensitive PIN diode 24. The output of the photodiode 24 is a weak current signal that is converted to a voltage signal and is amplified by the trans-impedance amplifier 26 (TIA). The clock-data recovery unit 28 (CDR) then recovers the clock information from the signal, samples it at the appropriate instance, and thresholds it to determine the transmitted data bits, which are provided at the binary data output 30.
A prior art CDR device is shown in FIG. 2B, and consists primarily of a clock recovery unit 50 and a sample and threshold unit 52. The clock recovery unit 50 determines the thresholding instance within a symbol period and the threshold unit 52 compares the signal value at that instance to a threshold value and resolves the transmitted bit to be a logical zero or one as a result of the comparison. The recovered clock may also be made available to other components via line 32.
The signal link from the transmitter to the receiver is not ideal, especially at high data rates; several impairments degrade the quality of the signal as it traverses the link. These impairments lead to erroneous decisions by the CDR 28 and hence increase the bit-error-rate (BER) of the link and can lead to link outages.
The sources of impairments in an optical link are many, and include linear effects, such as signal attenuation, reflections, and dispersion as well as nonlinear effects, such as self- and cross-phase modulation and four-wave mixing, among others. Optical amplifiers are employed at the transmitter and receiver ends as well as at intermediate points along the optical link to restore the loss of signal power. Several schemes are being proposed for dispersion compensation. These include employing dispersion compensation fibers (DCF) to mitigate the effect of chromatic dispersion and optical compensators to combat polarization mode dispersion.
These proposed schemes are difficult and expensive to implement, and may require manual adjustment or replacement of compensation fibers as the communication fiber or network link characteristics change over time. While Viterbi-type equalization may be used to provide improved performance, present implementations are not practical at high data rates due to the complexities associated with decoding.
Prior techniques of parallelizing aspects of the algorithmic processing have proven to be deficient in performance due to increased decoding delay or their inability to operate at sufficiently high speeds. Specifically, prior art implementations must perform an Add-Compare-Select operation at each clock cycle, thereby making the critical path too long to operate at high rates. Thus, there exists a need for improved decoding structures and methods to alleviate problems associated with high speed decoding.