A fundamental problem in the field of data storage and communication is the development of practical decoding methods for error-correcting codes. Chapters 1 through 8 of a textbook by Blahut, “Algebraic Codes for Data Transmission,” Cambridge University Press: Cambridge, 2003, are an excellent source for background information about error-correcting codes. The class of Reed-Solomon codes is one of the most important classes of error-correcting codes, and very widely used in practice, see chapters 6 and 7, ibid. Prior-art decoding methods for Reed-Solomon codes are known to be far from optimal.
Error-Correcting Codes
Any references to “codes” herein specifically mean linear block error-correcting codes. The basic idea behind these codes is to encode a string of k symbols using a string of N symbols, where N>k. The additional N−k bits are used to correct corrupted messages.
The string of N symbols is also sometimes called a “block” or a “word.” A block of N symbols that satisfies all the constraints of the code is called a “code word.” The symbols are assumed to be drawn from a q-ary alphabet. An important special case is when q=2. In that case, the code is called a “binary” code.
FIG. 1 shows a conventional scenario for which a linear block error-correcting code is used, which is often called “channel coding.” A source 101 produces a string s[a] containing k symbols 102, where the symbols are drawn from a q-ary alphabet. The string is passed to an encoder 10 of the error-correcting code, and is transformed into a code word x[n] containing N symbols 103.
The code word 103 is then transmitted through a channel 115, where the code word is corrupted into a signal y[n] 104. The corrupted signal y[n] 104 is then passed to a decoder 120, which outputs a reconstruction 105 z[n] of the code word x[n].
Parameters of Codes
A linear code is defined by a set of qk possible code words having a block length N. The parameter k is sometimes called the “dimension” of the code. Codes are normally much more effective when N and k are large. However, as the size of the parameters N and k increases, so does the difficulty of decoding corrupted messages.
The Hamming distance between two code words is defined as the number of symbols that differ in the two words. The distance d of a code is defined as the minimum Hamming distance between all pairs of code words in the code. Codes with a larger value of d have a greater error-correcting capability. Codes with parameters N, k, and q are referred to, as is well known in the art, as [N, k]q codes. If the distance d is also known, then they are referred to as [N, k, d]q codes.
Galois Fields
Linear codes are usually defined in terms of a set of constraints on the q-ary symbols that make up a code word. In order to characterize these constraints, it is useful to define an arithmetic for q-ary symbols. The theory of finite fields, which are also called Galois fields, provides a way to define addition and multiplication over q-ary symbols, see chapter 4, ibid.
In a Galois field, when any two symbols from a q-ary alphabet are added or multiplied together, the result is an element in the same alphabet. There is a multiplicative and additive identity element, and each element has a multiplicative and additive inverse, except that the additive identity element has no multiplicative inverse. The commutative, associative, and distributive laws hold for Galois fields.
Galois fields are denoted GF(q), where q is the number of elements in the alphabet. A Galois field can be specified in terms of its addition and multiplication tables. The simplest Galois field is GF(2), which has two elements 0 and 1, where 0 is the additive identity and 1 is the multiplicative identity.
As shown in FIG. 2, the addition rules for GF(2) are 0+0=1+1=0, and 0+1=1+0=1, and the multiplication rules for GF(2) are 0*0=0*1=1*0=0, and 1*1=1.
As shown in FIG. 3, GF(3) has three elements 0, 1, and 2, where 0 is the additive identity, 1 is the multiplicative identity, and the addition rules are 0+0=1+2=2+1=0, 0+1=1+0=2+2=1, 0+2=1+1=2+0=2, and the multiplication rules are 0*0=0*1=0*2=1*0=2*0=0; 1*1=2*2=1, 1*2=2*1=2.
Galois fields can be defined for any q that is a prime number or an integer power of a prime number. The addition and multiplication rules for any Galois field can be easily derived, see for example chapter 4, ibid.
These rules can be represented using addition and multiplication tables similar to those learned by school children for ordinary arithmetic. FIGS. 4–7 provide respectively the addition and multiplication tables for the Galois fields GF(4), GF(5), GF(8), and GF(9).
The operations of division and subtraction are also defined for Galois fields, and can be derived by adding negatives and multiplying inverses. The negative of x is the number that when added to x gives zero. The inverse of x is the number that gives one when multiplied by x. All sums and multiplications of q-ary symbols described herein use the rules of GF(q).
A primitive element of a Galois field is defined to be an element a such that every element of the Galois field except for the zeroth element can be expressed as a power of α. For example, in the field GF(5), one has 21=2, 22=4, 23=3, 24=1, so 2 is a primitive element of GF(5).
Generator Matrix Representations of Codes
A block code is “linear” when the sum of any two code words is also a code word. The sum of two code words of N symbols each is defined to be the word of N symbols, obtained by summing the individual symbols, one at a time. For example the sum of the two code words 1110100 and 0111010 using GF(2) is 1001110.
A generator matrix can compactly represent a linear code. In fact, many different generator matrices can represent the same linear code.
A generator matrix representing an [N, k]q code is a matrix of L rows and N columns, where each element in the matrix is a q-ary symbol. The N columns of the matrix correspond to the N symbols in a code word. The generator matrix contains k linearly independent rows. If L>k, then some of the rows of the generator matrix are redundant. All the code words in a code can be obtained by taking linear combinations of the rows of a generator matrix.
An illustrative example of a generator matrix is the following matrix for an
[N=4, k=2, d=3]q=3 code known as the “tetra-code”:
                    G        =                              (                                                            1                                                  0                                                  1                                                  1                                                                              0                                                  1                                                  1                                                  2                                                      )                    .                                    (        1        )            
The tetra-code has block-length N=4, and the number of code words is qk=9.
As an example, the tetra-code code word 1202 is obtained by summing the first row of the generator matrix with two times the second row, because 1202=1011+2*(0122) using GF(3). In all, the nine code words of the tetra-code are 0000, 1011, 2022, 0112, 1120, 2101, 0221, 1201, and 2210.
As another example, the following is a generator matrix of the [N=7, k=4, d=3]q=2 binary Hamming code:
                    G        =                              (                                                            1                                                  0                                                  1                                                  1                                                  0                                                  0                                                  0                                                                              0                                                  1                                                  0                                                  1                                                  1                                                  0                                                  0                                                                              0                                                  0                                                  1                                                  0                                                  1                                                  1                                                  0                                                                              0                                                  0                                                  0                                                  1                                                  0                                                  1                                                  1                                                      )                    .                                    (        2        )            
Encoders for Error-Correcting Codes
The encoder 110 for the linear [N, k]q code transforms the string of k symbols 102 into the code word of N symbols 103. The string of k symbols that is thus transformed into qk different code words is referred to as an “information block,” and the symbols in the information block are referred to as “information symbols.” Encoders can be constructed using the generator matrix for the code.
More specifically, suppose one has an information block s[a], and one desires to construct an N symbol code word x[n] for the code generated by the generator matrix G. Encoding can be done using the equation
                                          x            ⁡                          [              n              ]                                =                                    ∑                              a                =                1                            k                        ⁢                                          G                ⁡                                  [                                      a                    ,                    n                                    ]                                            ⁢                                                          ⁢                              s                ⁡                                  [                  a                  ]                                                                    ,                            (        3        )            
where G[a,n] is the value of the symbol in the matrix G in the ath row and nth column.
For example, consider the tetra-code, as represented by the generator matrix given in equation (3). If the information block is 12, then the corresponding code word is 1*1011+2*0112=1011+0221=1202, using the rules of addition and multiplication for GF(3).
Parity Check Matrix Representations of Codes
Parity check matrices can also represent linear codes. The parity check matrix representing the [N, k]q code is a matrix of q-ary symbols, with M rows and N columns. The N columns of the parity check matrix correspond to the N symbols of the code. The number of linearly independent rows in the matrix is k.
Each row of the parity check matrix represents a constraint. The symbols involved in the constraint represented by a particular row correspond to the columns that have a non-zero symbol in that row. The parity check constraint forces the weighted sum, over GF(q), of those symbols to be equal to zero. For example, for a binary code, the parity check matrix
                    H        =                  [                                                    1                                            1                                            1                                            0                                            1                                            0                                            0                                                                    0                                            1                                            1                                            1                                            0                                            1                                            0                                                                    0                                            0                                            1                                            1                                            1                                            0                                            1                                              ]                                    (        4        )            
represents the three constraintsx[1]+x[2]+x[3]+x[5]=0  (5)x[2]+x[3]+x[4]+x[6]=0  (6)x[3]+x[4]+x[5]+x[7]=0,  (7)
where x[n] is the value of the nth bit. This parity check matrix is another way to represent the [N=7, k=4, d=3]q=2 Hamming code represented by the generator matrix given in equation (2).
For the tetra-code, the generator matrix also happens to be a parity check matrix of the same code. Codes for which this is true are called self-dual codes. Thus, for the tetra-code,
                              H          =                      (                                                            1                                                  0                                                  1                                                  1                                                                              0                                                  1                                                  1                                                  2                                                      )                          ,                            (        8        )            
which represents the two constraintsx[1]+x[3]+x[4]=0  (9)x[2]+x[3]+2x[4]=0.  (10)
It is easy to verify that all the code words of the tetra-code satisfy these constraints.
Reed-Solomon Codes
Reed-Solomon codes are a very well-known and popular class of codes that have optimal distance properties. A Reed-Solomon code can be defined for any set of parameters N, k, and q, such that GF(q) is a finite field and N=q−1. The [N,k]q Reed-Solomon code has a distance d=N−k+1, which has been proven to be the maximum distance possible for any [N,k]q code.
A standard way to obtain the [N,k,d]q Reed-Solomon code is to construct a generator matrix G, that has k rows and N columns, according to the following prescription. One first chooses a primitive element α of GF(q). The element in the jth row and kth column of G is given the value α(j−1)(k−1).
For example, a [N=4,k=3,d=2]q=5 Reed-Solomon code would have the following generator matrix, assuming that one chose α=2 as the primitive element:
                    G        =                              (                                                            1                                                  1                                                  1                                                  1                                                                              1                                                  2                                                  4                                                  3                                                                              1                                                  4                                                  3                                                  2                                                      )                    .                                    (        11        )            
Notice that when the above specification of a generator matrix is combined with equation (3), it implies that a Reed-Solomon code word x[n] can be obtained from an information block s[a] according to the formula
                              x          ⁡                      [            n            ]                          =                              ∑                          a              =              1                        k                    ⁢                                    α                                                (                                      a                    -                    1                                    )                                ⁢                                  (                                      n                    -                    1                                    )                                                      ⁢                                                  ⁢                                          s                ⁡                                  [                  a                  ]                                            .                                                          (        12        )            
This formula has the form as a discrete Fourier transform over a Galois field, where N−k input variables have been set to zero. In the discrete Fourier transform over the Galois field, the primitive element α plays the role that an exponential term normally plays in a discrete Fourier transform over complex numbers, as it is a root of unity. This relationship between Reed-Solomon codes and discrete Fourier transforms is known, and is emphasized in chapters 6 and 7, ibid.
A useful property of Reed-Solomon codes is that they are “cyclic” codes. This means that when one cyclically shifts a code word, one obtains another code word. For example, cyclically shifting the code word 1243 of the above [N=4,k=3,d=2]q=5 Reed-Solomon code gives the code word 2431.
Extended Reed-Solomon Codes
Standard Reed-Solomon codes have a block-length N that is related to the parameter q by the equation N=q−1. Closely related codes for which N=q, called “extended Reed-Solomon codes,” can be obtained by appending a single symbol to a Reed-Solomon code. The distance d of an extended Reed-Solomon code is still given by d=N−k+1, and is still the optimal possible distance. To obtain the generator matrix of an extended Reed-Solomon code, one simply adds a single column to the generator matrix of the Reed-Solomon code. That column has a 1 in the first row, and 0's in every other row. For example, the generator matrix of the [N=5,k=3,d=3]q=5 extended Reed-Solomon code is
                    G        =                  (                                                    1                                            1                                            1                                            1                                            1                                                                    0                                            1                                            2                                            4                                            3                                                                    0                                            1                                            4                                            3                                            2                                              )                                    (        13        )            
Punctured Reed-Solomon Codes
Instead of appending symbols from a Reed-Solomon code, one can also obtain new codes by removing, or “puncturing” symbols. A punctured Reed-Solomon code has a block-length N that is less than q−1. The distance d of the punctured Reed-Solomon code is still given by d=N−k+1, and is still the optimal possible distance. To obtain the generator matrix of the punctured Reed-Solomon code, one removes columns from the generator matrix of the Reed-Solomon code.
Performance Criteria for Error-Correcting Codes
The decoder 120 for a linear [N, k]q code accepts as input a received and perhaps corrupted version y[n] 104 of a transmitted code word x[n] 103, and outputs the reconstruction z[n] 105. The performance of the decoder is measured in terms of failure rates. The failure rates measure how often the reconstruction z[n] fails to match the originally transmitted code word x[n]. The decoding failure rate depends on the amount of noise that the channel introduces: the higher the level of noise, the higher the decoder failure rate.
Optimal decoders output the most likely code word z[n], given the received signal y[n]. An optimal decoder is therefore often called a “maximum likelihood” decoder. Even an optimal decoder van sometimes fail, when the noise from the channel has made the originally transmitted code word x[n] less likely than some other code word.
Hard-Input Decoders for Error-Correcting Codes
A class of decoders, referred to as “hard-input decoders,” accepts inputs such that the corrupted version of the transmitted code word is an N-symbol word y[n], whose symbols take values from the same q-ary alphabet as the error-correcting code.
Such decoders are useful when the channel corrupts q-ary symbols in the code word to other q-ary symbols with some small probability. Making the standard assumption that all transmitted code words are a priori equally likely, an optimal hard-input decoder for such channels outputs the code word z[n] that has the smallest distance from y[n].
An example of a hard input for a tetra-code decoder would be the word 2122. This word is not a code word of the tetra-code, and the code word that has the smallest distance from this word is 2022, so an optimal hard-input decoder would output 2022 when the decoder received the input 2122.
Soft-Input Decoders for Error-Correcting Codes
Alternatively, a corrupted signal can first be transformed into a “cost function,” C, and then that cost function is input to the decoder. The cost function is a q×N matrix specifying a cost for each possible state of each code word symbol.
Decoders that accept such cost functions as their input are referred to as “soft-input” decoders. For the tetra-code, which has N=4 and q=3, an example cost function for the soft-input decoder is
                    C        =                              (                                                            0.1                                                  0.2                                                  0.1                                                  0.1                                                                              1.0                                                  0.0                                                  2.5                                                  1.5                                                                              0.0                                                  2.0                                                  0.0                                                  0.0                                                      )                    .                                    (        14        )            
This cost function means that the cost of assigning the first symbol the value ‘0’ is 0.1; the cost of assigning the first symbol the value ‘1’ is 1.0; the cost of assigning the first symbol the value ‘2’ is 0.0; the cost of assigning the second symbol the value ‘0’ is 0.2; and so on.
An optimal soft-input decoder returns a code word z[n] that has a lowest possible summed cost, given the cost function. For example the code word of the tetra-code that has the lowest cost, given the cost function above, is 0000, which has a cost of 0.1+0.2+0.1+0.1=0.5.
The cost in the soft-input decoder is often taken to be equal to the negative of the log-likelihood for each bit, given the received signal and the channel model. As mentioned before, optimal decoders are often called “maximum likelihood” decoders, which makes sense because minimizing the cost corresponds to maximizing the likelihood.
Soft input cost functions arise in many cases of practical interest. For example, in many practical communication applications, the q symbols of a q-ary code are transmitted by “modulating” the symbols into q different electromagnetic waveforms. When a waveform is received after passing through the channel, it is compared to the possible transmitted waveforms, and depending on how similar the waveform is to each of the possible transmitted waveforms, a cost is assigned to each of the q possible symbols.
Constructing optimal hard-input or soft-input decoders for error-correcting codes is a problem that becomes intractably complicated for codes with large N and k. For this reason, most decoders used in practice are not optimal.
Non-optimal hard-input decoders attempt to determine the closest code word to the received word, but are not guaranteed to do so, while non-optimal soft-input decoders attempt to determine the code word with the lowest cost, but are not guaranteed to do so.
Bounded Distance Decoders
Most prior-art decoders for Reed-Solomon codes are non-optimal hard-input decoders known as “bounded distance decoders.” The bounded-distance decoder decodes any received hard-input word to a nearest code word, so long as the input word has a Hamming distance to the nearest code word that is less than or equal to the bounded distance decoding radius t, where t=└(d−1)/2┘. Here, the floor function └x┘ indicates that the fractional part of x is subtracted.
There can be at most one code word within distance t or less of a word. Therefore, the bounded-distance decoder optimally decodes the input word to the transmitted code word whenever the channel changes t or fewer code word symbols. Conversely, the bounded-distance decoder fails to decode when the received word has a distance from any code word that is greater than the decoding radius t. If the channel changes t or more code word symbols, then the bounded-distance decoder fails to correctly decode the transmitted code word.
A variety of prior art bounded distance decoding methods have been developed for Reed-Solomon codes, see chapters 6–8, ibid. These decoding methods all depend ultimately on solving systems of algebraic equations over GF(q), and are therefore usually called “algebraic” decoding methods.
The error-correcting performance of a bounded-distance decoder is normally much worse than the performance of the optimal hard-input decoder. Some progress has been made recently in developing so-called “list decoders,” which are hard-input algebraic decoders that perform somewhat better than older bounded-distance decoders, though still not as well as optimal decoders, see V. Guruswami and M. Sudan, “Improved Decoding of Reed-Solomon and Algebraic-Geometric Codes,” IEEE Transactions on Information Theory, vol. 45, pp. 1757–1767, September 1999.
Using Hard-Input Decoders with Soft Inputs
When a soft-input cost function is given, but only a hard-input decoder is available, the hard-input decoder can nevertheless be used as a decoder by first thresholding the cost function to obtain a hard input. To threshold a cost function, one determines the lowest cost value for each symbol separately.
For example, given the cost function of equation (14) above, the lowest cost value for the first symbol is 2, the lowest cost value for the second symbol is 1, the lowest cost value for the third symbol is 2, and the lowest cost value for the fourth symbol is 2. Thus, by thresholding, one converts the soft-input cost function into the hard-input word 2122. An optimal hard-input decoder then decodes to the code word 2022.
Note that even if the hard-input decoder is optimal, the thresholding procedure will cause the decoding procedure for a soft-input cost function to become non-optimal. As mentioned already, for the example soft input given above, the optimal code word of the tetra-code was 0000, but thresholding followed by optimal hard-input decoding gave the non-optimal code word 2122.
It is known that the performance penalty caused by using thresholding and hard-input decoding is quite severe.
Therefore, it would be of great benefit to develop soft-input decoders for Reed-Solomon codes, rather than using the prior art method of thresholding, followed by hard-input decoding.
One such effort in this direction is the algebraic decoding method of R. Koetter and A. Vardy, which builds on the list-decoding method developed by Guruswami and Sudan. See R. Koetter and A. Vardy “Algebraic Soft-Decision Decoding of Reed-Solomon Codes,” IEEE Transactions on Information Theory, vol. 49, pp. 2809–2825, November 2003. Although the Koetter and Vardy's decoding method is of considerable interest, simulation results show that it only gives relatively small performance gains compared to simple thresholding followed by bounded-distance decoding, and is far from optimal soft-input decoding.
Turbo-Codes and Low-Density Parity Check Codes
Since 1993, when the outstanding performance of new soft-input decoding methods for a class of codes called “turbo-codes” was described, there has been a great deal of interest in approximate soft-input decoding methods based on iterative “message-passing” methods. These message-passing decoding methods are often called “belief propagation” decoding methods. Such a decoding method was actually first described in 1963 by R. Gallager to decode low-density parity check (LDPC) codes.
The success of turbo-codes rekindled an interest in LDPC codes and in soft-input message-passing decoding methods. There has been a considerable amount of recent work whose goal is to improve the performance of both turbo-codes and LDPC codes. For example a special issue of the IEEE Communications Magazine was devoted to this work in August 2003. For an overview, see C. Berrou, “The Ten-Year-Old Turbo Codes are entering into Service,” IEEE Communications Magazine, vol. 41, pp. 110–117, August 2003 and T. Richardson and R. Urbanke, “The Renaissance of Gallager's Low-Density Parity Check Codes,” IEEE Communications Magazine, vol. 41, pp. 126–131, August 2003.
Unlike Reed-Solomon codes, which are constructed using a regular construction, turbo-codes and LDPC codes are constructed using a random construction. For example, a binary LDPC code is defined in terms of its parity check matrix, which consists only of 0's and 1's, where a small number of 1's are placed randomly within the matrix.
At relatively low signal-to-noise ratios, i.e., when the corruption caused by the channel is relatively large, LDPC codes and turbo-codes can often outperform Reed-Solomon codes that are decoded using bounded-distance decoders.
On the other hand, because of their random construction, LDPC codes and turbo-codes are difficult to analyze theoretically, and it is very difficult to give any guarantees about their performance comparable to the guarantees that one obtains using bounded distance decoders for Reed-Solomon codes. LDPC codes and turbo-codes also suffer from the phenomena of “error-floors.” When a decoding method has an “error-floor,” that means that at as the signal-to-noise ratio becomes large i.e., as the corruption caused by the channel becomes small, the decoding failure rate becomes smaller, but only very slowly.
Error-floors are a serious problem for LDPC codes and turbo-codes, which means that for high signal-to-noise ratios, or for very low target decoding failure rates, Reed-Solomon codes and other regular codes with good distance properties and bounded distance decoders are often still preferred.
Codes Defined on Graphs
Message-passing decoding methods are best understood when the error-correcting codes that they decode are represented as graphs. Such graphs, now often called “factor graphs,” were first described in 1981 by R. M. Tanner, see R. M. Tanner “A Recursive Approach to Low Complexity Codes,” IEEE Transactions on Information Theory, vol. 27, pp. 533–547, September 1981, and F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Transactions on Information Theory, vol. 47, pp. 498–519, February 2001. There are several essentially equivalent forms of factor graphs.
The following discussion is based on so-called “normal” factor graphs, as described by G. D. Forney, Jr., in “Codes on Graphs: Normal Realizations,” IEEE Transactions on Information Theory, vol. 47, pp. 520–548, February 2001.
As shown in FIG. 8, a normal factor graph can be used to represent both the “hard” constraints that the symbols in a code word must satisfy, as well as the “soft” cost function that is input to a soft-input decoder.
A normal factor graph is drawn as a collection of connected vertices. The connections between the vertices, which are drawn as lines 800, represent “variables.” Some of the variables may be symbols that make up a code word of the code, some may be information symbols, and some may be other, so-called “auxiliary” variables that help to define the code.
Each variable can be in one of a number of different possible states. In all the factor graphs considered herein, each variable can be in q different states. The state of the ith variable is denoted x[i].
The vertices, which are drawn as squares and referred to as “factor nodes,” represent constraints placed on the variables that connect to that factor node. In a “normal” factor graph, each variable can be connected to either one or two factor nodes, and each factor node is connecteds to one or more variables.
The rule that no variable can be connected to more than two factor nodes may initially appear to be restrictive, but it is not, because a variable can be “copied” by connecting it to an equality constraint, and the copy can then be connected to additional factor nodes.
A marking is placed within the square representing each factor node to indicate what type of constraint it represents. For example, an “=” marking 801 is placed inside a square representing a factor node that constrains the connected variables to be equal.
Associated with each possible configuration of the variables connected to a factor node is a “cost.” The cost can be infinite. For “hard” constraints, which must be absolutely obeyed, some of the costs are in fact infinite. For example, for an equality constraint, one would define the cost for any configuration where the connected variable nodes were equal to be zero, and any configuration where the connected variable nodes were not equal to be infinite.
“Soft” constraints are also easy to represent in a factor graph, simply by using factor nodes that do not have any infinite costs. In the example factor graphs described here, soft constraints 802 are marked by factor nodes, which have a ‘C’ inside the square. The ‘C’ marking is used because it is the first letter of the words “cost” and “channel.” The soft constraints in a factor graph representing a code are obtained from the soft-input cost function coming from the channel.
The cost associated with the ath factor node is denoted Ca. It is a function only of the states of the variables connected to the ath factor node, which is denoted as {×[i]}ie N(a), or more succinctly as xa.
The total cost C of an overall configuration of variables in a normal factor graph is simply the sum of the costs of each factor:
                    C        =                              ∑            a                    ⁢                                                    C                a                            ⁡                              (                                  x                  a                                )                                      .                                              (        15        )            
In the factor graph in FIG. 8, there are three variables 800, and one factor 801 that constrains them all to be equal. A second, soft, constraint 802, is attached to one of the variables.
A factor graph by itself does not give all the information needed to determine the cost. In particular, one also needs to know the number of possible states of each variable, and the exact form of all the cost functions for the factor nodes.
Suppose, for the sake of example, that q=2 for the code represented by the factor graph shown in FIG. 8, and that the soft cost function is attached to the first variable node and gives a cost of 0 if that variable is a ‘0’, and 0.5 if that variable is a ‘1’. In that case, the configurations that give non-infinite cost are 000, which has a cost of 0, and 111, which has a cost of 0.5. This factor graph thus represents the binary code, which has a generator matrixG=(1 1 1),  (16)
and has a soft input cost function
                    C        =                              (                                                            0                                                  0                                                  0                                                                              0.5                                                  0                                                  0                                                      )                    .                                    (        17        )            
In general, to represent both a code and also the cost function that is input to a soft-input decoder, a factor graph must has factor nodes that represent the hard constraints defining the codes, as well as a soft constraint attached to each variable that is a code word symbol.
Often in the prior art, a slightly different, but completely equivalent interpretation of factor graphs is used. In that interpretation, the costs Ca(xa) are replaced by functions fa(xa) defined by fa(xa)=exp(Ca(xa)). The factor graph can then be interpreted as generating an overall probability density function over the configurations given by
                              p          =                                    1              Z                        ⁢                                          ∏                a                            ⁢                                                f                  a                                ⁡                                  (                                      x                    a                                    )                                                                    ,                            (        18        )            
where Z is a normalization constant introduced to ensure that the sum of the probabilities of all the configurations is one.
Factor Graphs for LDPC Codes
As described above, factor graphs are very often used to represent LDPC codes. Recall that LDPC codes are defined in terms of their parity check matrices. Given a parity check matrix, it is a possible to construct a corresponding normal factor graph for the code.
As an example of how this can be done, consider the parity check matrix for the [N=7, k=4, d=3]q=2 Hamming code:
  H  =            [                                    1                                1                                1                                0                                1                                0                                0                                                0                                1                                1                                1                                0                                1                                0                                                0                                0                                1                                1                                1                                0                                1                              ]        .  
As shown in the corresponding normal factor graph in FIG. 9, this code has seven code word variables 900, and three parity checks. Notice that there are seven soft constraint nodes 903, and three hard parity constraint factor nodes 901, marked by a ‘+’.
Each parity factor node is connected to the variables involved in that parity check. There are also seven hard equality constraint nodes 902, which are used to copy the variables representing the code word symbols x[n]. The equality constraints are necessary because of the rule that no variable is attached to more than two constraints in a normal factor graph.
Message-Passing on Factor Graphs
Message-passing decoding methods are closely related to factor graphs. There are a variety of known methods and they share some common features. For background, see F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Transactions on Information Theory, vol. 47, pp. 498–519, February 2001. There are also decoding methods based on “generalized belief propagation,” see J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms,” Mitsubishi Electric Research Laboratories, TR2002-35, August 2002.
In those methods, “messages” are sent from one factor node to another factor node, along the connections between the factor nodes. Recall that a connection between factor nodes represents a variable. A message is a vector of q numbers, corresponding to the q possible states of the variable along which it passes. The message represents the cost of the q possible states of the variable, given what the factor node that is sending the message knows from the other messages that it is receiving.
For example, if an equality constraint that connected three variables received messages from two of them that the cost of a ‘0’ was much lower than the cost of a ‘1’, the constraint sends out a message to the third variable that it should be in the ‘0’ state; that is that the cost of a ‘0’ for that variable should also be much less than a ‘1’.
A “belief” that a variable is in one of its possible states can then be determined from the messages arriving at that variable. The “belief” is normally represented as a q-ary vector. The vector contains estimated costs, or equivalently probabilities, that each of the q possible values of a variable node is the correct one. The decoder ultimately selects the state of each symbol by picking the symbol value whose belief has a highest probability or lowest cost.
In detail, different message-passing methods use different rules to update the messages, but the basic idea of these methods is always the same. One begins with some initial, unbiased messages, and then starts to update them according to the message update rules.
At each iteration, one can determine the state of each variable by the messages that it receives. After an ending criterion, such as a fixed number of iterations having passed, is achieved, the decoding method is stopped, and the state of each variable is determined for the last time. The lowest cost code word encountered during the decoding procedure is output by the decoder.
The Importance of Sparse Graphs
As already mentioned, given a parity check matrix for a code, a factor graph for the same code can be straightforwardly constructed. However, it has been observed empirically that message-passing decoding methods only work well on graphs that are sparse. A sparse normal factor graph is one that only has a small number of variables connected to each factor.
An intuitive explanation for the fact that sparse graphs are necessary is that if many messages are input into all the factor nodes, then it is likely that one or more message into each factor node will send incorrect information, which causes incorrect information to be sent out of every factor node. In a sparse normal graph, on the other hand, a few factor nodes may send out some incorrect messages, but enough factor nodes will send out correct messages so that the system can eventually settle into a low cost configuration.
The factor graphs that correspond to the parity check matrices of LDPC codes are inherently sparse because of the low-density property of the parity check matrices of LDPC codes.
Other linear codes can also sometimes be represented by sparse generator factor graph representations. For example, Reed-Muller codes and other codes based on finite geometries can be represented by sparse factor graphs, see G. D. Forney, Jr., “Codes on Graphs: Normal Realizations,” IEEE Transactions on Information Theory, vol. 47, pp. 520–549, February 2001; and J. S. Yedidia, J. Chen, and M. Fossorier, “Representing Codes for Belief Propagation Decoding,” Proceedings of the International Symposium on Information Theory, p. 176, 2003. Codes based on finite geometries, including Reed-Muller codes, are of some interest, but are used very much less in practice than Reed-Solomon codes because they have much worse distance properties.
The representation of Reed-Muller codes and other codes based on finite geometries used by Yedidia, et al. was a redundant representation. In a redundant representation, extra factor nodes are added, which are not necessary to define the code, because their constraints are already implied by constraints of other factor nodes in the graph. Redundant representations can sometimes by useful in improving the decoding performance of message-passing decoding methods.
Some short block length rate ½ binary codes with excellent distance properties, including the binary extended Golay code, also have known sparse factor graph representations, see J.-C. Carlach and A. Otmani, “A Systematic Construction of Self-Dual Codes,” IEEE Transactions on Information Theory, vol. 49, pp. 3005–3009, November 2003.
Until now, no sparse factor graph representation of Reed-Solomon codes is known. There has been no obvious way to construct such a representation, because the parity check matrices of Reed-Solomon codes are completely dense, i.e., every code word symbol is involved in every single constraint.
The lack of an appropriate sparse factor graph representation of Reed-Solomon codes has until now prevented the use of message-passing decoding methods to decode Reed-Solomon codes.
Therefore, there is a need for a sparse factor graph representation for Reed-Solomon codes so that message-passing methods can be used to decode Reed-Solomon codes.