Receivers capturing data can do so more efficiently if the data has been encoded allowing forward error correction. The Viterbi decoder uses the Viterbi algorithm for decoding a bitstream that has been encoded using Forward Error Correction based on a Convolutional code. The Viterbi algorithm is highly resource-consuming, but it does provide maximum likelihood decoding.
Viterbi decoders employ Trellis decoding to estimate the most likely sequence of events that lead to a particular state. U.S. patent application Ser. No. 12/496,538 filed Feb. 1, 2009 entitled “METHOD AND APPARATUS FOR CODING RELATING TO FORWARD LOOP” describes faster decoding in Viterbi decoders by employing 2 bits of the Trellis decoding to be performed using DSP instructions called R4ACS Radix-4 Add Compare Select (RACS4) and Radix-4 Add Compare Decision (RACD). This invention deals with the implementation of this class of DSP instructions.
Turbo codes are a type of forward error correction code with powerful capabilities. These codes are becoming widely used in many applications such as wireless handsets, wireless base stations, hard disk drives, wireless LANs, satellites, and digital television. A brief overview of Turbo decoders is summarized below.
A functional block diagram of a turbo decoder is shown in FIG. 1. This iterative decoder generates soft decisions from a maximum-a-posteriori (MAP) block using the probabilities represented by a-posteriori feedback terms A0 110 and A1 109. Each iteration requires the execution of two MAP decodes to generate two sets of extrinsic information. The first MAP decoder 102 uses the non-interleaved data as its input and the second MAP decoder 103 uses the interleaved data from the interleaver block 101.
The MAP decoders 102 and 103 compute the extrinsic information as:
                              W          n                =                  log          ⁢                                    Pr              ⁡                              (                                                      x                    n                                    =                                      1                    |                                          R                      1                      n                                                                      )                                                    Pr              ⁡                              (                                                      x                    n                                    =                                      0                    |                                          R                      1                      n                                                                      )                                                                        (        1        )            where: R1n=(R0,R1, . . . Rn−1) denotes the received symbols. The MAP decoders also compute the a posteriori probabilities:
                              Pr          ⁡                      (                                          x                n                            =                              i                |                                  R                  1                  n                                                      )                          =                              1                          Pr              ⁡                              (                                  R                  1                  n                                )                                              ⁢          Σ          ⁢                                          ⁢                      Pr            ⁡                          (                                                                    x                    n                                    =                  i                                ,                                                      S                    n                                    =                                      m                    ′                                                  ,                                                      S                                          n                      -                      1                                                        =                  m                                            )                                                          (        2        )            Here Sn refers to the state at time n in the trellis of the constituent convolutional code.
The terms in the summation can be expressed in the formPr(xn=i,Sn=m′,Sn−1=m)=αn−1(m)γni(m,m′)βn(m′)  (3)where the quantityγni(m,m′)=Pr(Sn=m′,xn=i,Rn|Sn−1=m)  (4)is called the branch metric, andαn(m′)=Pr·(Sn=m′,R1n)  (5)is called the forward (or alpha) state metric, andβ(m′)=Pr(Rn+1n|Sn=m′)  (6)is called the backward (or beta) state metric.
The branch metric depends upon the systematic, parity, and extrinsic symbols. The extrinsic symbols for a given MAP decoder are provided by the other MAP decoder at inputs 109 and 110. The alpha and beta state metrics are computed recursively by forward and backward recursions given byαn(m′)=αn−1(m)γni(m,m′)  (7)and
                                          β                          n              -              1                                ⁡                      (            m            )                          =                              Σ                                          m                ′                            ,              i                                ⁢                                    β              n                        ⁡                          (                              m                ′                            )                                ⁢                                    γ              n              ′                        ⁡                          (                                                m                  ′                                ,                m                            )                                                          (        8        )            The slicer 107 completes the re-assembling of the output bit stream x0 . . . xn−1 108.
The block diagram of the MAP decoder is shown in FIG. 2. The subscripts r and f present the direction, reverse and forward, respectively, of the sequence of the data inputs for the recursive blocks beta and alpha. Input bit streams 210-212 and 213-215 are labeled as parameters Xn,r, Pn,r, An,r and Xn,f, Pn,f, An,f respectively. Feedback streams are labeled αn,f and βn,r.
Both the alpha state metric block 202 and beta state metric block 203 calculate state metrics. Both start at a known location in the trellis, the zero state. The encoder starts the block of n information bits (frame size n=5114) at the zero state and after n cycles through the trellis ends at some unknown state.
The mapping of this task of computing the branch metrics and adding to the previous state metrics, to a class of DSP instructions (T4MAX/T2MAX) is outside the scope of this invention. The current invention deals with the efficient implementation of this class of DSP instructions.
One of the main sources of latency in computer arithmetic is the propagation of carries in the computation of a sum of two or more numbers. This is a well-studied area, which is not explored here except to note that the best algorithms for addition require a number of logic levels equal to:levels=2+log2*(width)  (9)where: width is the number of bits representing the numbers to be added.
FIG. 3 illustrates the three-to-two carry save circuit 302, otherwise known as the 3:2 CSA circuit, which takes three inputs 301 (a, b and c) and produces two outputs 303 (S and C0). This circuit has the property that when S and C0 are added together, they produce the same result as adding a+b+c. This process is often referred to as compressing the three numbers down to two numbers. The 3:2 CSA is sometimes referred to as a 3:2 compressor.
The three inputs can be any three bits, while the two outputs are the sum S and carry C0 resulting from the addition of these three bits. These are computed based on the following logical equations:S=a⊕b⊕c  (10)C0=(a*b)+(b*c)+(c*a)  (11)
The main advantage of using the 3:2 circuit is that equations (10) and (11) can typically be computed with a logic depth of no greater than 2. Thus it allows for faster computation of the sum of three numbers by preventing the carry from propagating. Therefore, given three numbers which need to be added together, rather than sequentially computing a+b=x, and then x+c, with a resulting delaydelay=2*(2+log2*(width))  (12A)one can process a+b+c through a 3:2 CSA compressor followed by an adder to achieve a total delay of:delay=4+log2*(width)  (12B)The savings in the number of logic level delays becomes even more pronounced when the width of the operands involved is large.