I. Field of the Invention
This invention generally relates to applications of the Viterbi algorithm. More particularly, the present invention relates to an improved system and method of performing a high-rate Add-Compare-Select (ACS) butterfly operation in an implementation of the Viterbi algorithm.
II. Description of Related Art
The Viterbi algorithm was first introduced in 1967 as a method for decoding convolutionally-encoded signals. Since its introduction, the algorithm has gained wide acceptance in the field of data communications, data recording, and digital signal processing. The algorithm has been used to successfully combat a variety of digital estimation issues, including the reduction of recording errors in storage media, the removal of intersymbol interference, and the enhancement of character and text recognition.
As such, the Viterbi algorithm has become the foremost method for the error-correction decoding of convolutionally-encoded data. For such applications, the Viterbi algorithm determines, based on a series of observations, the path with the smallest error metric that traverses a trellis typifying all possible encoder states. This shortest path exemplifies the mostly likely sequence generated by a convolutional encoder.
FIG. 1A illustrates a typical convolutional encoder. The convolutional encoder 100 comprises an 8-bit tapped shift register 110 and a pair of exclusive OR-type summers 120 that transform a sequence of input data bits U(D) 105 into a sequence of output code symbols C.sub.0 (D), C.sub.1 (D) 125. In particular, FIG. 1A demonstrates the example of a rate.sub.13 code which generates two output coding symbols C.sub.0 (D), C.sub.1 (D) 125 for each input data bit U(D) 105. It is to be noted that the specific code rate and configuration of the convolutional encoder 100 shown is merely illustrative and in no way limits the operation or scope of the various embodiments of the invention. As such, different code rates, such as 1/3or _, for example, could be used in conjunction with the embodiments of the invention.
Encoder 100 generates each output code symbol C.sub.0 (D), C.sub.1 (D) 125 by shifting and exclusive-OR summing the input bit stream U(D) 105 according to the particular shift-register configuration specified by generator code polynomials G.sub.0 (D), G.sub.1 (D). In this case, FIG. 1A depicts the shift-register interconnections that provide the rate .sub.13 generator code polynomial G.sub.0 (D)=1{character pullout}D.sup.2 {character pullout}D.sup.4 {character pullout}D.sup.7. The coefficients of polynomial G.sub.0 (D) are convolved with input data sequence U(D) 105 to generate output convolutional code symbol C.sub.0 (D) 125. Similarly, FIG. 1A shows the rate .sub.13 generator code polynomial G.sub.1 (D)=1{character pullout}D.sup.2 {character pullout}D.sup.5, whose coefficients are convolved with input data sequence U(D) 105 to generate output convolutional code symbol C.sub.1 (D) 125. The constraint length K of the encoder 100 is one more than the number of delay elements in shift register 110; for encoder 100, constraint length K equals 9. For each data bit 105 inputted into encoder 100, the output code symbols C.sub.0 (D), C.sub.1 (D) 125 depend on the inputted bit as well as the previous K-1 input bits. Therefore, the encoder 100 produces output code symbols C.sub.0 (D), C.sub.1 (D) 125 that are capable of spanning 2.sup.K-1 possible encoder states.
In a typical communication system, the output code symbols C.sub.0 (D), C.sub.1 (D) 125 are subsequently modulated and transmitted over a noisy channel (not shown). A decoder eventually receives the noisy convolutionally-encoded data stream and employs the Viterbi algorithm, which exploits the properties of convolutional codes to ultimately determine the input data sequence U(D) 105.
One advantage of convolutional codes is their highly repetitive structure, which provides for a symmetrical code tree. Theoretically, a convolutional code is capable of generating an infinite sequence of code symbols. However, because of its symmetry, the number of states that need to be evaluated in locating the most probable path leading to the inputted data sequence U(D) 105 is reduced to 2.sub.k-1 (in this case, 256) states. Moreover, in decoding such a symmetrical code, only the most probable (i.e surviving) local path into each of the 256 possible encoder states is of interest--all other paths maybe discarded from further consideration. This is because the most probable global path through a state must necessarily follow the surviving local path through that state.
The Viterbi decoder relies on these code properties to function as a finite state machine having a limited set of state transitions. The decoder hypothesizes each of the possible encoder 2.sup.k-1 states and determines the probability that the encoder transitioned from each of those states to the next set of 2.sup.k-1 possible encoder states, based on the observations obtained from the received noisy convolutionally-encoded data stream.
The transition probabilities are represented by quantities, referred to as metrics, which are proportional to the negative logarithm of the probability values. Clearly, the smaller the metric, the higher the probability of occurrence. There are two types of metrics: state metrics and branch metrics. The state metric, also called a path metric, represents the relative probability that the transmitted set of code symbols passed through a particular state. The branch metric represents the conditional probability that the transition from a particular source state to a particular target state was transmitted (assuming that the source state was correct).
The Viterbi algorithm may be summarized as follows: where time is divided into d samples and n possible states S.sub.i.sup.k exist at each time sample k (where i is an integer from 1{character pullout}n and k is an integer from 1{character pullout}d). For k&gt;1, each state may be reached by a path from any one of p precursor states S.sub.j.sup.k-1 (where j is an integer from 1{character pullout}p). For each state, the path with the minimum metric among these p possible paths is identified and stored, along with the value of that metric:
Initialization: for the starting time sample (k=1), the metric stored at each state S.sub.i.sup.1 is initialized. In the case where the starting state is known, the metric of this case may be set to zero while the metrics of the other states S.sub.i.sup.1 are set to a large number. This scheme forces later iterations of the algorithm to choose only paths originating from the desired starting state. PA1 Iteration: for each time sample (k=2{character pullout}d), all of the states S.sub.i.sup.k are visited. At each state S.sub.i.sup.k, the metric for each path j leading to that state is calculated as the sum of (a) the metric of the precursor state S.sub.j.sup.k-1 and (b) the metric bm.sub.j.sup.k of the branch leading from state S.sub.j.sup.k-1 to state S.sub.i.sup.k. Of the p paths leading to each state S.sub.i.sup.k, the path with the lowest metric (i.e. the survivor path) is selected and stored at that state, and the metric for that path is also stored as the metric sm.sub.i.sup.k for that state. PA1 Chainback: when all of the states for the last time sample have been visited, the state S.sub.i.sup.d having the lowest state metric is identified. The survivor path for this state is read from storage, and the corresponding state for time sample d-1 is thereby identified. The survivor path for this latter state is read from storage, and the chainback process is repeated until all of the states comprising the path leading to state S.sub.i.sup.d (i.e. the most likely path through the state-time matrix) have been identified.
Thus, at any time k, the Viterbi algorithm calculates the metrics of the paths leading to states S.sub.n.sup.k, determines the survivor paths (one for each of the n states S.sub.n.sup.k), and stores the n survivor paths as well as their respective metrics. This is equivalent to storing, for every target state considered, the source state which leads to it. As such, any implementation of the Viterbi algorithm requires the use of an Add-Compare-Select (ACS) unit 150, as illustrated in FIG. 1B, to perform these operations. The ACS unit 150 is responsible for calculating the state metric values and also characterizes the relationships between the source and target states by virtue of ACS butterfly operations. FIG. 2 depicts a single ACS butterfly operation 155.
The butterfly operation 155 includes the only possible state transitions that could have occurred for two particular source states in encoder 100. This is partly due to the fact that, at any given time, the state of encoder 100 is the encoder's previous state right-shifted by 1 bit. The next (right-shifted) information bit determines which transition is made from a source state and will appear as the most significant bit (MSB) of the target state. As such, there are only two possible target states that a source state can transition to. Thus, as evidenced by FIG. 2, encoder 100 can only transition from source state "x0" to target state "0x" or "1x" and from source state "x1" to target state "0x" or "1x", depending on the value of the inputted data bit U(D). It is to be noted that notation "x0" and "1x" indicate that the least significant bit (LSB) of the source state is "0" and "1", respectively, while the upper bits are represented by "x"; and notation "0x" and "1x" indicate that the MSB of the target states are "0" or "1", respectively, while the lower bits are represented by "x". The term "x" represents the same value (e.g., 7 bit value) whether it is included in the source state or target state.
FIG. 2 also reveals that each transition from a source state to a target state generates a hypothesized set of code symbols H.sub.0 (D), H.sub.1 (D) or H.sub.0 +L (D), H.sub.1 +L (D),. In fact, when encoder 100 operates along the parallel branches of the ACS butterfly 155 (e.g., transitions from "x0" to "0x" or from "x1" to "1x") code symbols H.sub.0 (D), H.sub.1 (D) 125 are generated for both parallel branches. This feature is due in part to the repetitive nature of convolutional codes in general, as well as the use of generator code polynomials having their MSBs and LSBs set to unity (i.e., for both G.sub.0 (D) and G.sub.1 (D), g.sub.0 and g.sub.8 are equal to 1). In like fashion, code symbols H.sub.0 +L (D), H.sub.1 +L (D) are generated when encoder 100 operates along either of the diagonal branches of the ACS butterfly 155 (e.g., transitions from "x0" to "1x" or from "x1" to "0x").
As stated above, the ACS 150 unit calculates the target state metrics tm.sub.0x, tm.sub.1x. The ACS 150 logic stores the source state metrics sm.sub.x0, sm.sub.x1 which relate to the probability that a received set of code symbols leads to source states "x0" and "x1". Returning to FIG. 1B, upon receiving a set of code symbols, the Branch Metric Unit 140 computes the branch metric values bm.sub.ij, bm.sub.ij. ACS 150 "adds" the branch metric bm.sub.ij, bm.sub.ij corresponding to each of the two transitions leading to a particular target state to the corresponding source state metric sm.sub.x0, sm.sub.x1. The branch metrics bm.sub.ij, bm.sub.ij represent the conditional probability that the transition from a particular source state to a particular target state occurred. Branch metric bm.sub.ij indicates how closely the received code symbols match the ACS 150 hypothesized code symbols H.sub.0 (D), H.sub.1 (D) 125, and branch metric bm.sub.ij indicates how closely the received code symbols match H.sub.0 +L (D), H.sub.1 +L (D). The value of branch metrics bm.sub.ij, bm.sub.ij is dependent only upon the distance between the received symbol pair and the hypothesized symbol pair H.sub.0 (D), H.sub.1 (D).
For each of the two target states, the ACS 150 compares the sum of the source state metric-branch metric pairs leading to that target state. The most likely transition into each target state, represented by the smallest metric sum, is then "selected" by ACS 150 and assigned to that target state as the target state metric tm.sub.0x, tm.sub.1x.
As stated above, the ACS 150 logic adds the branch metric bm.sub.ij, bm.sub.ij to the source state metric sm.sub.x0, sm.sub.x1 for each of the two transitions leading to a target state and decides that the most likely path into that target state came from the transition that yields the smaller metric sum. The smaller metric sum is then selected and becomes the new target state metric tm.sub.0x, tm.sub.1x. The ACS 150 also stores the state metrics (i.e., the costs associated with the most likely path leading to each target state) into the state RAM 145. As indicated by FIG. 1B, the selection of the smallest metric sum results in the storing of a one-bit quantity, referred to as a decision bit, in the path memory of a chainback memory unit 160. The decision bit, which is indicated by the LSB of the winning source state metric, identifies which of the two transitions was selected.
The chainback memory unit 160 stores the decision bit corresponding to the most likely transition into each target state. For encoder 100 having a constraint length K=9, there will be 2.sub.K-1 or 256 decision bits generated which correspond to each of the 256 possible states of encoder 100. Once a matrix of all such information for a predetermined number of states is generated and stored, the chainback unit 160 starts at the state with the greatest likelihood of heading the correct path (i.e., the state among all those corresponding to the most recent time unit having the lowest cost). The chainback unit 160 then chains backward in time by reading through the last P.times.256 (i.e., P.times.2.sup.K-1) decision bits to select P bits, where P is the effective chainback depth of the path memory. Since the decision bits represent the most likely set of bits hypothesized to have been passed through the encoder 100, they are the best data that can be outputted by the decoder. As a result, the further back in the decision history the chain goes, the better likelihood that the selected path merges with the correct path. Thus, the higher the chainback depth P, the better the performance but the higher the pipeline and storage delays. The chainback depth P is, therefore, generally set between 3 and 10 times the encoder 100 constraint length K. For a K=9 encoder, the chainback depth P is typically set at 64.
An ACS processing cycle defines the period in which the ACS unit 150 calculates new target state metrics tm.sub.0x, tm.sub.1x for a predetermined number of received code symbols. For a_rate convolutional code, each pair of received code symbols requires 1 process cycle for metric calculations. The length of the process cycle equals the number of clock cycles required to perform the ACS butterfly operations for all encoder states for two sets of received symbols. For example, a Viterbi decoder having a single ACS butterfly, as depicted in FIG. 2, would generally require 128 clock cycles per received code symbol to perform the operations for all 256 states of encoder 100. To improve processing speed, ACS butterfly array architectures deploying multiple ACS butterflies can be used to reduce the number of clock cycles in one processing cycle.
An example of such an architecture is the 8.times.1 ACS butterfly array 300, illustrated in FIG. 3. Array 300 proffers an 8.times. improvement in processing speed by virtue of implementing 8 parallel ACS butterfly units 155 in parallel. For a set of received code symbols, the 8.times.1 butterfly array 300 uses all 8 butterfly units 155 to read 16 of the source states and calculate the 16 corresponding target state metrics tm.sub.0x, tm.sub.1x, within a single clock cycle. As stated above, the ACS unit 155 receives the state metric for each of the source states and branch metrics bm.sub.ij, bm.sub.ij for each of the four possible transitions. The branch metric bm.sub.ij, bm.sub.ij is dependent only upon the value of the received code symbol pair and the hypothesized symbol pair H.sub.0 (D), H.sub.1 (D) or H.sub.0 +L (D), H.sub.1 +L (D), and is a measurement of the distance between the two. The "X" included as part of the source and target states in FIG. 3 represents a four-bit place-holder (i.e, X=[X0, X1, X2, X3]) which chronicles through 16 clock cycles by counting from 0 through 15. Thus, for two sets of received code symbols, the 8.times.1 butterfly array 300 computes the target state metrics tm.sub.0x, tm.sub.1X, for all 256 possible states of encoder 100 in 32 clock cycles (i.e., 16 clock cycles for each received code symbol).
A drawback of the 8.times.1 butterfly array architecture 300 is that for each set of received code symbols, it needs to read 16 source state metrics and must simultaneously generate the required branch metrics for each of the 16 clock cycles. Thus the 8.times.1 butterfly array 300 requires an immense memory bandwidth to accommodate such operations.
Another example of the array architectures is the 4.times.2 ACS butterfly array 400, illustrated in FIG. 4. The 4.times.2 ACS butterfly array 400 boasts the same speed improvement as the 8.times.1 butterfly array 300, but does so by implementing 2 sets of 4 ACS butterfly 155 units in parallel. Butterfly array 400 mitigates the memory bandwidth issue by temporarily storing the intermediate target state metrics tm.sub.0X, tm.sub.1X. For example, within a single clock cycle, the first stage of array 400 reads the 8 source states and calculates the 8 corresponding target state metrics tm.sub.0X, tm.sub.1X. However, butterfly array 400 does not immediately store the intermediate target state metrics tm.sub.0x, tm.sub.1x. Instead, while still within the clock cycle, butterfly array 400 rearranges the intermediate target states to feed into the second stage, as source states, and subsequently calculates the 8 corresponding target state metrics tm.sub.0X, tm.sub.1X for the next set of received code symbols. Thus, much like the 8.times.1 butterfly array 300, butterfly array 400 is capable of computing the target state metrics tm.sub.0X, tm.sub.1X for two sets of received code symbols over a span of 32 clock cycles.
The 4.times.2 ACS butterfly array 400 has the distinct advantage of reducing the ACS 150 state memory bandwidth, since the intermediate target state metrics (i.e., first stage target metrics tm.sub.0x, tm.sub.1x) do not need to be read from, or written to, the ACS 150 state memory. Rather, the intermediate target state values flow combinatorially into the next stage, avoiding delays and minimizing bandwidth requirements.
However, the 4.times.2 ACS butterfly array 400 is not without its limitations. For example, the advantage of reducing the state memory bandwidth rests squarely on the fact that array 400 performs 2 stages of ACS 150 calculations within a single clock cycle. This critical path can be significantly limiting for higher clock speeds.
Moreover, for either the 8.times.1 ACS butterfly array 300 or the 4.times.2 ACS butterfly array 400, there exist performance issues with respect to the chainback operation. As stated above, the chainback unit 160 is responsible for storing the decision bits generated by the ACS array and for chaining back through the stored decision bits to generate the decoded decision bits.
For an encoder having a constraint length K=9 (e.g., encoder 100), the ACS array in the decoder will generate 2.sup.K-1 or 256 decision bits for each set of received code symbols (i.e., 1 decision bit for each of the 256 possible encoder states) and the chainback memory unit 160 will typically contain a chainback path memory depth of P=64 blocks.
After 32 process cycles, each of which computes the target state metrics for two sets of received symbols, the chainback unit 160 begins with the most recent process cycle (e.g., the rightmost memory block B.sub.0 of the 64 path memory blocks), as shown in FIG. 5A. The chainback unit 160 identifies, from the 256 decision bits within chainback memory block B.sub.0, the decision bit corresponding to the state with the lowest metric value R.sub.0. This state is defined as the best state BS.sub.0, and has an 8 bit address, as shown in FIG. 5B. The chainback unit 160 reads the best state decision bit value and then introduces the value into the BS.sub.0, address by left-shifting it into the BS.sub.0, least significant bit (i.e., bs.sub.0), as shown in FIG. 5. FIG. 5B further illustrates that the values of the other bits in the BS.sub.0 address (i.e., bs.sub.6, bs.sub.5, bs.sub.4, bs.sub.3, bs.sub.2, bs.sub.1) are also left-shifted, resulting in the loss of the BS.sub.0, most significant bit (i.e., bs.sub.7) and the manifestation of a new address BS.sub.1. As depicted in FIG. 5A, BS.sub.1, is the address of the best state value R.sub.1 in chainback memory block B.sub.1. The chainback unit 160 then reads the decision bit value corresponding to the BS.sub.1, address and left-shifts that value into the BS.sub.1, address to generate the next address BS.sub.2, which corresponds to the best state of chainback memory block B.sub.2.
This read and left-shift operation is repeated until all chainback memory blocks (i.e., P=64 blocks) have been processed. Generally, the chainback operation performs as many reads as the defined chainback length P, so that in this case, for example, 64 reads are performed to trace back the desired path and generate the decoded decision bits. This many reads, however, may compromise the efficiency and performance of the decoding process.
What is needed, therefore, is a system and method that is capable of efficiently performing high-rate ACS butterfly operations in a Viterbi algorithm implementation.