Error-Correcting Codes
A fundamental problem in the field of data storage and communication is the development of practical decoding methods for error-correcting codes (ECC).
One very important class of error-correcting codes is the class of linear block error-correcting codes. Unless specified otherwise, any reference to a “code” herein refers to a linear block error-correcting code.
The basic idea behind these codes is to encode a block of k information symbols using a block of N symbols, where N>k. The additional N−k bits are used to correct corrupted signals when they are received over a noisy channel or retrieved from faulty storage media.
A block of N symbols that satisfies all the constraints of the code is called a “code-word,” and the corresponding block of k information symbols is called an “information block.” The symbols are assumed to be drawn from a q-ary alphabet.
An important special case is when q=2. In this case, the code is called a “binary” code. In the examples given herein, binary codes are assumed, although the generalization of the decoding methods described herein to q-ary codes with q>2 is usually possible. Binary codes are the most important codes used in practice.
In a conventional “channel coding” with a linear block error-correcting code, a source produces an information block of k symbols u[a]. The information block is passed to an encoder of the error-correcting code. The encoder produces a code-word x[n] containing N symbols. The code-word is then transmitted through a channel, where the code-word is possibly corrupted into a signal y[n]. The corrupted signal y[n] is then passed to a decoder, which attempts to output a reconstruction z[n] of the code-word x[n].
Code Parameters
A binary linear block code is defined by a set of 2k possible code-words having a block length N. The parameter k is sometimes called the “dimension” of the code. Codes are normally much more effective when N and k are large. However, as the size of the parameters N and k increases, so does the difficulty of decoding corrupted messages.
The Hamming distance between two code-words is defined as the number of symbols that differ in two words. The distance d of a code is defined as the minimum Hamming distance between all pairs of code-words in the code. Codes with a larger value of d have a better error-correcting capability. Codes with parameters N and k are referred to as [N,k] codes. If the distance d is also known, then the codes are referred to as [N, k, d] codes.
Code Parity Check Matrix Representations
A linear code can be represented by a parity check matrix. The parity check matrix representing a binary [N,k] code is a matrix of zeros and ones, with M rows and N columns. The N columns of the parity check matrix correspond to the N symbols of the code. The number of linearly independent rows in the matrix is N−k.
Each row of the parity check matrix represents a parity check constraint. The symbols involved in the constraint represented by a particular row correspond to the columns that have a non-zero symbol in that row. The parity check constraint forces the weighted sum modulo-2 of those symbols to be equal to zero. For example, for a binary code, the parity check matrix
  H  =      [                            1                          1                          1                          0                          1                          0                          0                                      0                          1                          1                          1                          0                          1                          0                                      0                          0                          1                          1                          1                          0                          1                      ]  represents the three constraintsx[1]+x[2]+x[3]+x[5]=0,x[2]+x[3]+x[4]+x[6]=0, andx[3]+x[4]+x[5]+x[7]=0,where x[n] is the value of the nth bit, and the addition of binary symbols is done using the rules of modulo-2 arithmetic, such that 0+0=1+1, and 0+1=1+0+1.
Graphical Model for a Code
The parity check matrix of a linear code is often represented using a graphical model, also called a “Tanner graph” in the art. A Tanner graph is a bi-partite graph with two kinds of nodes: “variable nodes” corresponding to codeword symbols, and “constraint nodes” corresponding to parity constraints. Thus, there is one variable node for each column of the parity check matrix representing the code, and one constraint node for each row of the parity check matrix. A variable node is connected to a constraint node in the Tanner graph if the corresponding symbol is in a constraint equation. Thus, there is a line connecting a variable node and a constraint node for each non-zero entry in the parity check matrix.
Other graphical models for codes exist, which can be easily converted into Tanner graphs. For example, G. David Formey, has popularized “factor graph” representation in which variables are represented by the lines in the graph, and code symbols are represented as an equality constraint between the variables, see G. D. Formey, “Codes on Graphs: Normal Realizations,” IEEE Transactions on Information Theory, February 2001, vol. 47, pp. 520-548. Herein, we described decoding methods in terms of Tanner graphs, but it is well-understood how to convert these decoding methods to work with other graphical models.
Error-Correcting Code Decoders
The task of a decoder for an error-correcting code is to accept the received signal after the transmitted code-word has been corrupted in a channel, and try to reconstruct the transmitted code-word. The optimal decoder, in terms of minimizing the number of code-word decoding failures, outputs the most likely code-word given the received signal. The optimal decoder is known as a “maximum likelihood” decoder. Even a maximum likelihood (ML) decoder will sometimes make a decoding error and output a code-word that is not the transmitted code-word if the noise in the channel is sufficiently great.
Iterative Decoders
In practice, maximum likelihood decoders can only be constructed for special classes of error-correcting codes. There has been a great deal of interest in non-optimal, approximate decoders based on iterative methods. One of these iterative decoding methods is called “belief propagation” (BP). Although he did not call it by that name, R. Gallager first described a BP decoding method for low-density parity check (LDPC) codes in 1963.
Turbo Codes
In 1993, similar iterative methods were shown to perform very well for a new class of codes known as “turbo-codes.” The success of turbo-codes was partially responsible for greatly renewed interest in LDPC codes and iterative decoding methods. There has been a considerable amount of recent work to improve the performance of iterative decoding methods for both turbo-codes and LDPC codes, and other related codes such as “turbo product codes” and “repeat-accumulate codes.” For example a special issue of the IEEE Communications Magazine was devoted to this work in August 2003. For an overview, see C. Berrou, “The Ten-Year-Old Turbo Codes are entering into Service,” IEEE Communications Magazine, vol. 41, pp. 110-117, August 2003 and T. Richardson and R. Urbanke, “The Renaissance of Gallager's Low-Density Parity Check Codes,” IEEE Communications Magazine, vol. 41, pp. 126-131, August 2003.
Many turbo-codes and LDPC codes are constructed using random constructions. For example, Gallager's original binary LDPC codes are defined in terms of a parity check matrix, which consists only of 0's and 1's, where a small number of 1's are placed randomly within the matrix according to a pre-defined probability distribution. However, iterative decoders have also been successfully applied to codes that are defined by regular constructions, like codes defined by finite geometries, see Y. Kou, S. Lin, and M. Fossorier, “Low Density Parity Check Codes Based on Finite Geometries: A Rediscovery and More,” IEEE Transactions on Information Theory, vol. 47, pp. 2711-2736, November, 2001. In general, iterative decoders work well for codes with a parity check matrix that has a relatively small number of non-zero entries, whether that parity check matrix has a random or regular construction.
In a system with a decoder of an LDPC code based on BP, the system processes the received symbols iteratively to improve the reliability of each symbol based on the constraints, as given by the parity check matrix that specifies the code.
In a first iteration, the BP decoder only uses channel evidence as input, and generates soft output messages from each symbol to the parity check constraints involving that symbol. The step of sending messages from the symbols to the constraints is sometimes called the “vertical” step. Then, the messages from the symbols are processed at the neighboring constraints to feed back new messages to the symbols. This step is sometimes called the “horizontal” step. The decoding iteration process continues to alternate between vertical and horizontal steps until a certain termination condition is satisfied. At that point, hard decisions are made for each symbol based on the output reliability measures for symbols from the last decoding iteration.
The messages of the BP decoder can be visualized using the Tanner graph, described above. Vertical step messages go from variable nodes to constraint nodes, while horizontal step messages go from constraint nodes to variable nodes.
The precise form of the message update rules, and the meaning of the messages, varies according to the particular variant of the BP method that is used. Two particularly popular message-update rules are the “sum-product” rules and the “min-sum” rules. These prior-art message update rules are very well known, and approximations to these message update rules also have proven to work well in practice. Further details are described in U.S. Pat. No. 7,373,585, “Combined-replica group-shuffled iterative decoding for error-correcting codes,” issued to Yedidia, et al. on May 13, 2008, incorporated herein by reference.
In some variants of the BP method, the messages represent the log-likelihood that a bit is either a 0 or a 1. For more background material on the BP method and its application to error-correcting codes, see F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Transactions on Information Theory, vol 47, pp. 498-519, February 2001.
Quantized Belief Propagation
In practice, a popular way to implement BP decoders is to quantize the messages to some small number of possible values. For example, a decoder may only use messages that have the values +1 and −1. Quantized BP (QBP) decoders tend to become more powerful as the number of possible messages increases, thereby better approximating un-quantized BP decoders. On the other hand, as the number of possible messages increases, the cost and complexity of implementing the quantized BP decoder tends to increase as well.
Another QBP decoder is known in the art as “Algorithm-E,” see Richardson et al., “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inform. Theory, vol. 47, pp. 599-618, February 2001. Because this decoder is of particular interest, its functionality for decoding an LDPC code on the binary symmetric channel (BSC) is described in detail here.
The Algorithm-E decoder decodes an input sequence of symbols received from a BSC. The input sequence of symbols is represented by the vector y. Binary phase shift keying (BSPK) is used so that a 0 symbol at position n maps to yn=1, and a 1 symbol at position n maps to yn=−1.
Let H=[Hmn] be a M by N parity check matrix of the LDPC code. The set of variable nodes that participate in check j is denoted by N(j)={k:Hik=1}, and the set of checks in which the variable k participates is denoted as Q(k)={j:Hjk=1}. N(j)\k is the set N(j) with variable k excluded, and Q(k)\j is the set Q(k) with check j excluded.
The Algorithm-E decoder quantizes BP messages to −1, 0, or +1 values. Messages and beliefs associated with the ith iteration are denoted as:                u(i)mn: The message passed from the check node m to variable node n;        v(i)mn: The message passed from the variable node n to check node m; and        v(i)n: The belief of variable node n.        
The steps of the “Algorithm-E” decoder are:                Step 1: For 1≦m≦M, and each nεN(m), process        
      u    mn          (              i        -        1            )        =            ∏                        n          ′                ∈                              N            ⁡                          (              m              )                                ⁢          \          ⁢                                          ⁢          n                      ⁢                  ⁢          v              mn        ′                    (                  i          -          1                )                            Step 2: For 1≦n≦N, and each mεQ(n), process        
            v      mn              (        i        )              =          sgn      (                                    w                          (              i              )                                ·                      y            n                          +                              ∑                                          m                ′                            ∈                                                𝒬                  ⁡                                      (                    n                    )                                                  ⁢                \                ⁢                m                                              ⁢                                          ⁢                      u                                          m                ′                            ⁢              n                                      (              i              )                                          )        ,                where w(i) is an appropriately selected weight, and        
      v    n          (      i      )        =            sgn      (                                    w                          (              i              )                                ·                      y            n                          +                              ∑                                          m                ′                            ∈                              𝒬                ⁡                                  (                  n                  )                                                              ⁢                                          ⁢                      u                                          m                ′                            ⁢              n                                      (              i              )                                          )        .                  Here sgn(x)=1 if x>0, sgn(x)=−1 if x<0, and sgn(0)=0.        Step 3: construct a vector ĉ(i)=[ĉ(i)], such that ĉ(i)=1 if v(i)n<0, ĉ(i)=0 if v(i)n>0, randomly select ĉ(i) if v(i)n=0. If Hĉ(i)=0, or Imax is reached, stop the decoding iteration and output ĉ(i) as the decoded codeword. Otherwise, set i:=i+1 and go to Step 1.        
Bit-Flipping Decoders
Bit-flipping (BF) decoders are iterative decoders that work similarly to BP decoders. These decoders are somewhat simpler. Bit-flipping decoders for LDPC codes also have a long history, and were also suggested by Gallager in the early 1960's when he introduced LDPC codes. In a bit-flipping decoder, each code-word bit is initially assigned to be a 0 or a 1 based on the channel output. Then, at each iteration, the syndrome for each parity check is computed. The syndrome for a parity check is 0 if the parity check is satisfied, and 1 if it is unsatisfied. Then, for each bit, the syndromes of all the parity checks that contain that bit are checked. If a number of those parity checks greater than a pre-defined threshold are unsatisfied, then the corresponding bit is flipped. The iterations continue until all the parity checks are satisfied or a predetermined maximum number of iterations is reached.
Other Codes that can be Decoded Iteratively
There are many other codes that can successfully be decoded using iterative decoding methods. Those codes are well-known in the literature and there are too many of them to describe them all in detail. Some of the most notable of those codes are turbo-codes, see C. Berrou and A. Glavieux, “Near-Optimum Error-Correcting Coding and Decoding: Turbo-codes,” IEEE Transactions in Communications, vol. 44, pp. 1261-1271, October 1996; the irregular LDPC codes, see M. A. Shokrollahi, D. A. Spielman, M. G. Luby, and M. Mitzenmacher, “Improved Low-Density Parity Check Codes Using Irregular Graphs,” IEEE Trans. Information Theory, vol. 47, pp. 585-598 February 2001; the repeat-accumulate codes, see D. Divsalar, H. Jin, and R. J. McEliece, “Coding Theorems for ‘Turbo-like’ Codes,” Proc. 36th Allerton Conference on Communication, Control, and Computing, pp. 201-210, September, 1998; the LT codes, see M. Luby, “LT Codes,” Proc. Of the 43 Annual IEEE Symposium on Foundations of Computer Science, pp. 271-282, November 2002; and the Raptor codes, see A. Shokrollahi, “Raptor Codes,” Proceedings of the IEEE International Symposium on Information Theory, p. 36, July 2004.
Linear Programming and Adaptive Linear Programming Decoders
An alternate decoding method for LDPC codes is based on linear programming (LP). LP decoding has some attractive features not available in BP decoding. The LP decoder deterministically converges. Whenever the decoder outputs a codeword, the codeword is guaranteed to be a maximum likelihood (ML) solution. When the LP decoder converges to a non-binary solution, a well-defined “pseudo-codeword” has been found.
The ML decoding problem is equivalent to the following integer optimization problem:                minimize γTĉ subject to ĉεC,where γ is a known vector of negative log-likelihoods, and T is a transpose operator, wherein the nth entry is defined as        
      γ    n    =            log      (                        Pr          ⁡                      [                                                            y                  n                                |                                  c                  n                                            =              0                        ]                                    Pr          ⁡                      [                                                            y                  n                                |                                  c                  n                                            =              1                        ]                              )        .  
When the channel is BSC, γn=log [p/(1−p)], if a received BSPK symbol is yn=−1, and γn=log [(1−p)/p] if the received BSPK symbol is yi=1.
The variables and constraints in above integer optimization problem are binary. In a relaxed LP version of the problem, each symbol ĉn is relaxed to a corresponding variable {circumflex over (b)}n, which can take values between 0 and 1. Each parity check is replaced by a number of local linear constraints that the codewords must satisfy. The intersection of these constraints defines the polytope over which the LP solver operates. The binary vertexes of the polytope correspond to codewords in the code C. When the LP optimum is at such a vertex, the LP is satisfied, and the ML solution is found. Non-binary solutions are termed pseudo-codewords.
Unfortunately, LP decoding is more complex than BP decoding. One approach reduces the computational load by using an adaptive linear programming (ALP) decoder, see Taghavi et al., “Adaptive methods for linear programming decoding,” IEEE Transactions on Information Theory, vol. 54, no. 12, pp. 5386-5410, December 2008.
Mixed-Integer Linear Programming Decoders
When the solution to the LP decoding problem is non-binary, one is motivated to find a tightening of the original LP relaxation. The objective of the tightening is to produce a modified LP problem that eliminates the formerly optimum pseudo-codeword, without eliminating any binary vertexes, thereby driving the solution of the modified LP problem towards the ML solution.
One approach is to add a small number of integer constraints resulting in a mixed integer linear program (MILP). In particular, in a MILP decoder, one can identify the symbol whose value is closest to 0.5. For this index n*=arg minn|{circumflex over (b)}n−0.5 |, the decoder includes the integer constraint {circumflex over (b)}n*ε{0, 1}, and re-executes the LP decoder including this integer constraint. If, even after adding an integer constraint, the MILP decoder fails to decode, more integer constraints can be added. In the related application by Draper et al., this approach is used for an LDPC code.
Performance of Different Decoders of the Same Code
For each different type of code, a number of different decoders will, in general, be available. For example, for an LDPC code, one can choose between (among others) bit-flipping decoders, quantized BP decoders at different levels of quantization, BP decoders using sum-product or min-sum methods, LP decoders, and MILP decoders.
For other well-known codes, like for example Reed-Solomon codes, there also exist a multiplicity of different decoding methods, including “classical” decoding methods found in coding textbooks like the Berlekamp-Massey decoding method, or newer decoding methods based on belief propagation, or list decoding methods.
Different decoding methods are generally compared by plotting the error rate of the decoder as a function of the signal-to-noise ratio (SNR). Two common measures are the “word error rate,” (WER), which measures the fraction of blocks that are not correctly decoded to the transmitted code-word, and the “bit error rate” (BER), which measures the fraction of bits that are incorrectly decoded. Herein, the focus is on WER. A decoder is “better,” or “more powerful” than another at a particular signal-to-noise ratio if its WER is lower.
FIG. 3 shows the WER as a function of SNR for different decoders operating on the same code. In this example, decoder A 310 is better than decoder B for low signal-to-noise ratios, but decoder B 320 is better than decoder A for high signal-to-noise ratios. FIG. 4 shows an example where decoder A 410 performs better than decoder B 420 at every SNR.
Processing Time of Decoders
Often, there is a trade-off between the WER performance of a decoder and the time it takes the decoder to process a block. As an example, if a BP decoder is allowed to run for more iterations per block before being terminated, it will have a better WER performance. Simple decoders tend to have a small processing time but poor WER performance while more complex decoders tend to have a longer processing time and better WER performance. Sometimes one decoder will have a WER that is orders of magnitude better than a second decoder, but also have a processing time that is orders of magnitude worse than the second decoder.
It is highly desirable to construct decoders that eliminate the tradeoff between performance and processing time, i.e., decoders that have simultaneously an excellent performance and a small processing time.