1. Field of the Invention
This invention generally relates to applications of the Viterbi algorithm. More particularly, the present invention relates to a novel method and apparatus for storing and retrieving state metrics in order to enhance the performance of high-rate Add-Compare-Select (ACS) butterfly operations in Viterbi implementations.
2. Description of Related Art
The Viterbi algorithm was first introduced in 1967 as a method for decoding convolutionally encoded signals. Since its introduction, the algorithm has gained wide acceptance in the fields of data communications, data recording, and digital signal processing. The algorithm has been used successfully in a variety of digital estimation applications, including the reduction of recording errors in storage media, the removal of intersymbol interference, and the enhancement of character and text recognition.
As such, the Viterbi algorithm has become the foremost method for the error-correction decoding of convolutionally encoded data. For such applications, the Viterbi algorithm determines, based on a series of observations, the path with the smallest error metric that traverses a trellis typifying all possible encoder states. The sequence of states along this xe2x80x9cshortest pathxe2x80x9d corresponds to the sequence mostly likely generated by the convolutional encoder.
FIG. 1A illustrates a typical convolutional encoder. This convolutional encoder 100 comprises an 8-bit tapped shift register 110 and a pair of exclusive OR-type summers 120 that transform a sequence of bits from an input bit stream U(D) 105 into a paired sequence 125 of output code symbols C0(D), C1(D). In particular, FIG. 1A demonstrates the example of a rate xc2xd code which generates a set of two output coding symbols C0(D), C1(D) 125 for each bit inputted from input bit stream U(D) 105. It is to be noted that the specific code rate and configuration of the convolutional encoder 100 shown are merely illustrative and in no way limit the operation or scope of the various embodiments of the invention. As such, different code rates, such as ⅓ or xc2xe, for example, may be used in conjunction with embodiments of the invention as described below.
Encoder 100 generates each output code symbol pair C0(D), C1(D) of sequence 125 by shifting and exclusive-OR summing the input bit stream U(D) 105 according to the particular shift-register configuration specified by generator code polynomials G0(D), G1(D). In this case, FIG. 1A depicts a configuration corresponding to the rate xc2xd generator code polynomial G0(D)=1⊕D2⊕D4⊕D7. The coefficients of polynomial G0(D) are convolved with input bit stream U(D) 105 to generate output convolutional code symbol C0(D) of sequence 125. Similarly, FIG. 1A also shows a configuration that corresponds to the rate xc2xd generator code polynomial G1(D)=1⊕D2⊕D5, whose coefficients are convolved with input bit stream U(D) 105 to generate output convolutional code symbol C1(D) of sequence 125.
The constraint length K of encoder 100 is one more than the number of delay elements in shift register 110. For encoder 100, for example, constraint length K equals 9. For each data bit of input bit stream U(D) 105 inputted into encoder 100, the output code symbol pair C0(D), C1(D) of sequence 125 may depend on the inputted bit as well as the previous Kxe2x88x921 input bits. Therefore, encoder 100 produces output code symbol pairs that are capable of spanning 2Kxe2x88x921 possible encoder states.
In a typical communication system, the output code symbol pairs C0(D), C1(D) of sequence 125 are subsequently modulated and transmitted over a noisy channel (not shown). A decoder eventually receives the noisy convolutionally encoded data stream and employs the Viterbi algorithm, which exploits the properties of convolutional codes to ultimately determine the input bit stream U(D) 105.
One advantage of convolutional codes is their highly repetitive structure, which provides for a symmetrical code tree. Such symmetry reduces the number of states that need to be evaluated in locating the most probable path. Moreover, in decoding such a symmetrical code, only the most probable local path leading into each of the 256 possible encoder states is of interest. All other paths may be discarded from further consideration, because the most probable global path through a state must necessarily include the most probable local path through that state. (Note that in some applications of the Viterbi algorithm, the decision as to which local path is most probable may be deferred until information relating to subsequent states is available.)
The Viterbi decoder relies on these code properties to function as a finite state machine having a limited set of state transitions. The decoder hypothesizes each of the 2Kxe2x88x921 possible encoder states and determines the probability that the encoder transitioned from each of those states to each of the next set of 2Kxe2x88x921 possible encoder states. In this case, the transition probability is based on observations which are obtained from the received noisy convolutionally encoded data stream.
The probability of each state transition is expressed by a quantity, referred to as a metric, which represents a distance (e.g., in code space) between that state transition and what was actually observed at that point in the input data stream. This distance may be expressed as, for example, a Hamming distance, a Euclidean distance, or a negative logarithm of a probability value, depending on the particular application. Clearly, the smaller the metric, the higher the probability of occurrence. There are two types of metrics: state metrics and branch metrics. The state metric represents the relative probability that the transmitted set of code symbols passed through a particular state. The branch metric represents the conditional probability that the transition from a particular source state to a particular target state was transmitted (assuming that the source state was correct).
The Viterbi algorithm has been implemented efficiently by employing an Add-Compare-Select (ACS) unit 150, as illustrated in FIG. 1B. The ACS unit 150 calculates the target state metric values and also characterizes the relationships between the source and target states by virtue of ACS butterfly operations. FIG. 2 depicts a single ACS butterfly operation 155, which evaluates the only possible state transitions that could have occurred for two particular adjacent source states in encoder 100 . This limitation is partly due to the fact that, at any given time, the state of encoder 100 is the encoder""s previous state right-shifted by 1 bit. The next (right-shifted) information bit determines which transition is made from a source state and will appear as the most significant bit (MSB) of the target state. For a binary data stream, there are only two possible target states that a source state can transition to. Thus, as evidenced by FIG. 2, encoder 100 can only transition from source state xe2x80x9cx0xe2x80x9d to target state xe2x80x9c0xxe2x80x9d or xe2x80x9c1xxe2x80x9d and from source state xe2x80x9cx1xe2x80x9d to target state xe2x80x9c0xxe2x80x9d or xe2x80x9c1xxe2x80x9d, depending on the value of the inputted data bit of bit stream U(D) 105. In this figure, and elsewhere, notations xe2x80x9cx0xe2x80x9d and xe2x80x9cx1xe2x80x9d indicate that the least significant bit (LSB) of the source state is xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d, respectively, while the upper bits are represented by xe2x80x9cxxe2x80x9d; and notations xe2x80x9c0xxe2x80x9d and xe2x80x9c1xxe2x80x9d indicate that the MSB of the target states are xe2x80x9c0xe2x80x9d or xe2x80x9c1xe2x80x9d, respectively, while the lower bits are represented by xe2x80x9cxxe2x80x9d. The term xe2x80x9cxxe2x80x9d represents the same value (e.g., a 7-bit value) whether it is included in the number of a source state or of a target state.
FIG. 2 also reveals that each pair of transitions from the source states to the target states generates a hypothesized pair of code symbols H0(D), H1(D) or {overscore (H)}0(D), {overscore (H)}1(D). In fact, when the most likely transitions are along the parallel branches of the ACS butterfly 155 (e.g., transitions from xe2x80x9cx0xe2x80x9d to xe2x80x9c0xxe2x80x9d and from xe2x80x9cx1xe2x80x9d to xe2x80x9c1xxe2x80x9d), the pair H0(D), H1(D) is generated. This feature is due in part to the repetitive nature of convolutional codes in general, as well as to the use of generator code polynomials having their MSBs and LSBs set to unity (i.e., for both G0(D) and G1(D), factors g0 and g7 are equal to 1). In like fashion, code symbols {overscore (H)}0(D), {overscore (H)}1(D) are generated when the most likely transitions are along the diagonal branches of the ACS butterfly 155 (e.g., transitions from xe2x80x9cx0xe2x80x9d to xe2x80x9c1xxe2x80x9d and from xe2x80x9cx1xe2x80x9d to xe2x80x9c0xxe2x80x9d).
As stated above, the ACS 150 unit calculates the target state metrics tm0x, tm1x. The ACS 150 logic receives the source state metrics smx0, smx1 which relate to the probability that a received set of n code symbols leads to source states xe2x80x9cx0xe2x80x9d and xe2x80x9cx1xe2x80x9d, respectively. Returning to FIG. 1B, upon receiving a set of code symbols, the branch metric unit 140 computes the branch metric values bmi,j, bm{overscore (i,j)}. The branch metrics bmi,j, bm{overscore (i,j)} represent the conditional probability that the transition from a particular source state to a particular target state occurred. Specifically, for a rate 1/n convolutional code, branch metric bmi,j indicates how closely the set of n received code symbols matches the set of hypothesized code symbols H0(D), H1(D), and branch metric bm{overscore (i,j)} indicates how closely the set of n received code symbols matches the hypothesized set {overscore (H)}0(D), {overscore (H)}1(D). ACS 150 xe2x80x9caddsxe2x80x9d the branch metric bmi,j, bm{overscore (i,j)} corresponding to each of the two transitions leading to a particular target state to the corresponding source state metric smx0, smx1.
For each of the two target states, ACS 150 then xe2x80x9ccomparesxe2x80x9d the sums of the source state metric and branch metric pairs leading to that target state. The most likely transition into each target state, represented by the smallest metric sum, is then xe2x80x9cselectedxe2x80x9d by ACS 150 and assigned to that target state as the target state metric tm0x, tm1x.
As stated above, the ACS 150 logic adds the branch metric bmi,j, bm{overscore (i,j)} to the source state metric smx0, smx1 for each of the two transitions leading to a target state and decides that the most likely path into that target state came from the transition that yields the smaller metric sum. The smaller metric sum then becomes the new target state metric tm0x, tm1x. The ACS 150 also stores the target state metrics (i.e., the costs associated with the most likely path leading to each target state) into the state random-access memory (RAM) 145. As indicated by FIG. 1B, the selection of the smallest metric sum results in the storing of the LSB of the winning source state metric, referred to as a decision bit, in the path memory of a chainback memory unit 160.
To facilitate the description of the related art, we will define a process cycle as the number of clock cycles required to calculate new target state metrics for two complete (and consecutive) levels of (Kxe2x88x921) encoder states by performing ACS butterfly operations upon two consecutive sets of n received symbols. For example, a Viterbi decoder having a single ACS butterfly 155, as depicted in FIG. 2, would generally produce two target states per clock cycle and would thus require 128 clock cycles per received code symbol to perform the ACS operations for all 256 states of encoder 100 (i.e. one complete level) Therefore, a process cycle for such a decoder would be defined as 256 clock cycles. To improve processing speed, ACS butterfly array architectures that employ multiple ACS butterflies 155 may be used to reduce the number of clock cycles per process cycle.
One example of such an architecture is the 2xc3x972 ACS butterfly array 300, depicted by FIG. 3, which operates on two consecutive received code symbols in order to improve processing speed. As described above, ACS butterfly 155 reads two source states (e.g., states x0and x1) and calculates two target states (e.g., states 0x and 1x). For encoder 100, having constraint length K=9, x represents a 7-bit place-holder (i.e., x=[x6, x5, x4, x3, x2, x1, x0]) which is incremented from 0 through 127 over a period of 128 clock cycles. If x=0000000, for example, the 2xc3x972 ACS butterfly array 300 reads a block of four source states=0000 0000, 0000 0001, 0000 0010, and 0000 0011 (i.e., 00(H), 01(H), 02(H), and 03(H), where the label (H) indicates a hexadecimal number) from memory (e.g., state RAM 145). The 2xc3x972 ACS butterfly array 300 then calculates the correspond block of four target states=0000 0000, 1000 0000, 0100 0000, 1100 0000 (i.e., 00(H), 80(H), 40(H), and C0(H)) and writes them into memory. Because at least some of the target states in the output block (e.g., 00(H), 80(H), 01(H), 81(H)) may represent different encoder states than the source states in the input block (e.g., 00(H), 01(H), 02(H), 03(H)), the output block of target states is stored to a different memory location (e.g. within state RAM 145). In this manner, butterfly array 300 may complete one process cycle (for K=9) in 64 clock cycles.
Another example is the 4xc3x972 ACS butterfly array 400, illustrated in FIGS. 4A, 4B and proposed in U.S. patent application Ser. No. 09/422,920, filed Oct. 21, 1999, entitled xe2x80x9cHigh-Speed ACS for Viterbi Decoder Implementationsxe2x80x9d assigned to the assignee of the present application and herein incorporated by reference. The 4xc3x972 ACS butterfly array 400 boasts an 8xc3x97 improvement in processing speed by virtue of implementing two sets of four ACS butterfly 155 units in parallel. During each clock cycle, the first stage of array 400 reads a block of eight source states and calculates the corresponding block of eight intermediate target state metrics for a set of n received code symbols. The parameter X, which is included as part of the source and target states in FIG. 4A, represents a four-bit place-holder (i.e., X=[X3, X2, X1, X0]) that is incremented from 0 to 15 over a period of sixteen clock cycles. The intermediate target states are rearranged to feed into the second stage of the array 400 (as source states) and the corresponding block of eight target state metrics are calculated for the subsequent set of n received code symbols. Thus, butterfly array 400 is capable of computing a complete set of target state metrics for two sets of n received code symbols (i.e., one process cycle) for K=9 in only 32 clock cycles.
Still another example architecture is the modified 8xc3x971 ACS butterfly array 500, illustrated in FIGS. 5A, 5B and also described in U.S. patent application Ser. No. 09/422,920 incorporated by reference above. Array 500 also proffers an 8xc3x97 improvement in processing speed by virtue of implementing eight parallel ACS butterfly units 155 in parallel. For one set of n received code symbols, 8xc3x971 butterfly array 500 uses all eight butterfly units 155 to read a block of 16 source states and calculate the corresponding block of 16 target state metrics (as identified by the 4-bit counter X) in one clock cycle. During the next clock cycle, butterfly array 500 uses the calculated target states as source states for the subsequent set of n received code symbols. Thus, for two received code symbols, 8xc3x971 butterfly array 500 computes the target state metrics for all 256 possible states of encoder 100 (i.e., one process cycle) in 32 clock cycles.
Generally, ACS architectures such as those described above require the state RAM to be divided into two separate areas. The source states are read from one area, while the target states are written to the second area. When the source states have been consumed (e.g. after each process cycle), the roles of the memory areas are switched (i.e. the target states previously written are read and the source states previously read are overwritten). This double buffering operation continues until all 256 target states have been calculated. Although double buffering may be wasteful (of chip area, for example), the technique has been used because at least some of the target states generated in the output block may correspond to different encoder states than the source states which are consumed to produce them.
To obviate the need for double buffering, a technique has been proposed in U.S. patent application Ser. No. 09/129,021, filed on Aug. 4, 1998 and entitled xe2x80x9cSingle RAM Add-Compare-Select Unit for Serial Viterbi Decoderxe2x80x9d assigned to the assignee of the present application and herein incorporated by reference. The proposed technique attempts to overcome double buffering by providing a novel addressing scheme for determining the address from which to read each source state. In this technique, a modulo-N ACS cycle counter is incremented every time a complete set of 2Kxe2x88x921 source states is processed (N being dependent on the particular architecture). In order to derive the memory read address for each source state, the addressing scheme rotates the number of the source state to be processed, where the number of bits to rotate is indicated by the value of the modulo-N ACS cycle counter and the number of each source state is in the range of 0 to 2Kxe2x88x921. For example, during the first ACS cycle, no rotation is performed. During the second ACS cycle, the read address for each source state is obtained by rotating the state number by 1. During the third ACS cycle, the state number is rotated by 2 to obtain the read address for each source state. In each subsequent ACS cycle, the counter is successively incremented by 1 until the ACS cycle count reaches Nxe2x88x921, at which point the pattern repeats.
For parallelized ACS butterfly structures, it is desirable to maximize the number of states that are read out of (and written into) memory at one time. Moreover, to optimize memory utilization it is also desirable to write the calculated target states into the same memory locations that correspond to the consumed source states (i.e., the read addresses). However, to do so efficiently has required delaying the write operation for a number of clock cycles until a convenient number of contiguous target states are calculated. Thus, the calculated target states are temporarily stored in pipeline registers until a contiguous number of target states are calculated, at which time the contiguous target states are written into memory locations that have sequential addresses. It is to be noted that although such implementations may avoid double buffering, the number of pipeline registers they require increases as the number of target states calculated increases. Increasing pipeline registers increases circuit complexity, increases circuit area, and makes inefficient use of power. The number of pipeline registers needed is further aggravated by ACS butterfly architectures that employ parallel ACS butterflies 155 (such as the 2xc3x972 and 4xc3x972 configurations described above) to improve performance.
What is needed, therefore, is a method and apparatus capable of locating and storing states within a single memory in a manner that enhances the performance of high-rate ACS butterfly operations.
Methods and apparatuses consistent with the principles of the present invention address the need identified above by providing a method and apparatus that are capable of locating and storing states within a single memory in a manner that enhances the performance of high-rate ACS butterfly operations.
As such, in an exemplary embodiment, the present invention includes the application of an addressing scheme to determine the address locations of source state metrics during a process cycle. The source state metrics are read from those address so locations during the process cycle and applied to an add-compare-select butterfly operation of a Viterbi algorithm implementation to generate target state metrics. The method then stores the target state metrics into the address location or locations previously occupied by the source state metrics which were read. The method further provides an addressing scheme that determines the address locations of the source state metrics based on a process cycle counter that is incremented and rotated in accordance with the process cycle. The method also provides an addressing scheme that employs a predetermined function to determine the address locations of the source state metrics.