Disclosed herein is a method, and related devices and systems, of reducing the memory required to implement Low-Density Parity-Check (LDPC) decoding with no sacrifice in performance.
Error Correction Codes (ECCs) are commonly used in communication and storage systems. Various physical phenomena occurring both in communication channels and in storage devices result in noise effects that corrupt the communicated or stored information. Error correction coding schemes can be used for protecting the communicated or stored information against the resulting errors. This is done by encoding the information before transmission through the communication channel or storage in the memory device. The encoding process transforms an information bit sequence i into a codeword v by adding redundancy to the information. This redundancy can then be used in order to recover the information from a corrupted codeword y through a decoding process. An ECC decoder decodes the corrupted codeword y and recovers a bit sequence î that should be equal to the original information bit sequence i with high probability.
One common ECC class is the class of linear binary block codes. A length N linear binary block code of dimension K is a linear mapping of length K information bit sequences into length N codewords, where N>K. The rate of the code is defined as R=K/N. The encoding process of a codeword v of dimension 1×N is usually done by multiplying the information bit sequence i of dimension 1×K by a generator matrix G of dimension K×N according tov=i·G  (1.1)
It also is customary to define a parity-check matrix H of dimension M×N, where M=N−K. The parity-check matrix is related to the generator matrix through the following equation:GHT=0  (1.2)
The parity-check matrix can be used in order to check whether a length N binary vector is a valid codeword. A 1×N binary vector v is a valid codeword if and only if the following equation holds:H·vT=0  (1.3)
In recent years iterative coding schemes have become very popular. In these schemes the code is constructed as a concatenation of several simple constituent codes and is decoded using an iterative decoding algorithm by exchanging information between the constituent decoders of the simple codes. Another family of iterative decoders operates on a code that can be defined using a bipartite graph describing the interconnections between check nodes and bit nodes. In this case, decoding can be viewed as an iterative passing of messages via the edges of the graph.
One popular class of iterative codes is LDPC codes. An LDPC code is a linear binary block code defined by a sparse parity-check matrix H. As shown in FIG. 1, the code can also be defined by a sparse bipartite graph G=(V,C,E) with a set V of N bit nodes, a set C of M check nodes and a set E of edges connecting bit nodes to check nodes. The bit nodes correspond to the codeword bits and the check nodes correspond to parity-check constraints on the bits, or alternatively to the rows of the parity check matrix H. A bit node is connected by edges to the check nodes that the bit node participates with.
LDPC codes can be decoded using iterative message passing decoding algorithms. These algorithms operate by exchanging messages between bit nodes and check nodes via the edges of the underlying bipartite graph that represents the code. The decoder is provided with initial estimates of the codeword bits. The initial estimates are a set of reliability measures. For example, if data are stored in a flash memory, in which the atomic units for holding data are cells, the reliability of each bit is a function of the mapping from a group of bits to a state that is programmed to a flash cell. The reliability of each bit also is a function of the voltage band read from the flash cell. These initial estimates are refined and improved by imposing the parity-check constraints that the bits should satisfy as a valid codeword (according to equation (1.3)). This is done by exchanging information between the bit nodes representing the codeword bits and the check nodes representing parity-check constraints on the codeword bits, using the messages that are passed via the graph edges.
In iterative decoding algorithms, it is common to utilize “soft” bit estimates, which convey both the bit estimate itself and the reliability of the bit estimate.
The bit estimates conveyed by the messages passed via the graph edges can be expressed in various forms. A common measure for expressing a “soft” bit estimate is the Log-Likelihood Ratio (LLR) given by:
                              L          ⁢                                          ⁢          L          ⁢                                          ⁢          R                =                  log          ⁢                                    Pr              ⁡                              (                                  v                  =                                      0                    ❘                                          current                      ⁢                                                                                          ⁢                      constraints                      ⁢                                                                                          ⁢                      and                      ⁢                                                                                          ⁢                      observations                                                                      )                                                    Pr              ⁡                              (                                  v                  =                                      1                    ❘                                          current                      ⁢                                                                                          ⁢                      constraints                      ⁢                                                                                          ⁢                      and                      ⁢                                                                                          ⁢                      observations                                                                      )                                                                        (        1.4        )            where the “current constraints and observations” are the various parity-check constraints taken into account in computing the message at hand and the observations v correspond to measurements (typically of threshold voltage band values, e.g. if the bits represent data stored in a memory device such as a flash memory) of the bits participating in these parity checks. Without loss of generality, LLR notation is used throughout the rest of this document. The sign of the LLR provides the bit estimate (i.e. positive LLR corresponds to bit v=0 and negative LLR corresponds to bit v=1). The magnitude of the LLR provides the reliability of the estimation (i.e. |LLR|=0 means that the estimate is completely unreliable and |LLR|=∞ means that the estimate is completely reliable and the bit value is known).
Usually, the messages passed during the decoding operation via the graph edges between bit nodes and check nodes are extrinsic. An extrinsic message m passed from a node n via an edge e may take into account all the values received on edges connected to node n other than edge e (this is why it is called extrinsic: it is based only on new information).
One example of a message passing decoding algorithm is the Belief-Propagation (BP) algorithm, which is the best algorithm in this family of algorithms.
Let
                              P          v                =                  log          ⁢                                    Pr              ⁡                              (                                  v                  =                                      0                    ❘                    y                                                  )                                                    Pr              ⁡                              (                                  v                  =                                      1                    ❘                    y                                                  )                                                                        (        1.5        )            denote the initial decoder estimate for a bit v, based on the received or read symbol y. Note that it is also possible that there is no y observation for some of the bits, in which case there are two possibilities:
First possibility: shortened bits. The bits are known a-priori and Pv=±∞ (depending on whether the bit is 0 or 1).
Second possibility: punctured bits. The bits are unknown a-priori and
                                          P            v                    ⁢                      ❘                          punctured              ⁢                                                          ⁢              bit                                      =                  log          ⁢                                    Pr              ⁡                              (                                  v                  =                  0                                )                                                    Pr              ⁡                              (                                  v                  =                  1                                )                                                                        (        1.6        )            where Pr(v=0) is the a-priori probability that the bit v is 0 and Pr(v=1) is the a-priori probability that the bit v is 1. Assuming the information bits have equal a-priori probabilities to be 0 or 1 and assuming the code is linear it follows that
                                          P            v                    ⁢                      ❘                          symetric              ,              punctured                                      =                              log            ⁢                                          1                /                2                                            1                /                2                                              =          0                                    (        1.7        )            
Let:
                              Q          v                =                  log          ⁢                                    Pr              ⁡                              (                                                      v                    =                                          0                      ❘                                              y                        _                                                                              ,                                                            H                      ·                                              v                        _                                                              =                    0                                                  )                                                    Pr              ⁡                              (                                                      v                    =                                          1                      ❘                                              y                        _                                                                              ,                                                            H                      ·                                              v                        _                                                              =                    0                                                  )                                                                        (        1.8        )            where the final decoder estimation for bit v, based on the entire received or read sequence y and assuming that bit v is part of a codeword (i.e. assuming H·vT=0). Let Qvc denote a message from bit node v to check node c. Let Rcv denote a message from check node c to bit node v. The BP algorithm utilizes the following update rules for computing the messages:
The bit node to check node computation rule:
                              Q          vc                =                              P            v                    +                                    ∑                                                c                  ′                                ∈                                                      N                    ⁡                                          (                                              v                        ,                        G                                            )                                                        ⁢                  \                  ⁢                  c                                                      ⁢                          R                                                c                  ′                                ⁢                v                                                                        (        2.1        )            where N(n,G) denotes the set of neighbors of a node n in the graph G.
The check node to bit node computation rule:
                              R          cv                =                              φ                          -              1                                (                                    ∑                                                v                  ′                                ∈                                                      N                    ⁡                                          (                                              c                        ,                        G                                            )                                                        ⁢                  \                  ⁢                  v                                                      ⁢                          φ              ⁡                              (                                  Q                                                            v                      ′                                        ⁢                    c                                                  )                                              )                                    (        2.2        )            where
      φ    ⁡          (      x      )        =      {                  sign        ⁡                  (          x          )                    ,                        -          log                ⁢                                  ⁢                  tanh          ⁡                      (                                                          x                                            2                        )                                }  and operations in the φ domain are done over the group {0,1} ×R+ (this basically means that the summation here is defined as summation over the magnitudes and XOR over the signs). The final decoder estimation for bit v is:
                              Q          v                =                              P            v                    +                                    ∑                                                c                  ′                                ∈                                  N                  ⁡                                      (                                          v                      ,                      G                                        )                                                                                                                    ⁢                          R                                                c                  ′                                ⁢                v                                                                        (        2.3        )            
The order of passing messages during message passing decoding is called the decoding schedule. BP decoding does not imply utilizing a specific schedule—it only defines the computation rules (2.1), (2.2) and (2.3). The decoding schedule does not affect the expected error correction capability of the code. However, the decoding schedule can significantly influence the convergence rate of the decoder and the complexity of the decoder.
The standard message-passing schedule for decoding LDPC code is the flooding schedule, in which in each iteration all the variable nodes, and subsequently all the check nodes, pass new messages to their neighbors. The standard BP algorithm based on the flooding schedule is given in FIG. 2.
The standard implementation of the BP algorithm based on the flooding schedule is expensive in terms of memory requirements. We need to store a total of 2|V|+2|E| messages (for storing the PvQv,Qvc and Rcv messages). Moreover, the flooding schedule exhibits a low convergence rate and hence results in higher decoding logic for providing a required error correction capability at a given decoding throughput.
More efficient, serial message passing decoding schedules, are known in the literature. In a serial message passing schedule, either the bit nodes or the check nodes are serially traversed and for each node, the corresponding messages are sent into and out from the node. For example, a serial message passing schedule can be implemented by serially traversing the check nodes in the graph in some order and for each check node cεC the following messages are sent:
1. Qvc for each vεN(c) (i.e. all Qvc messages into the node c).
2. Rcv for each vεN(c) (i.e. all Rcv messages from node c).
Serial schedules, in contrast to the flooding schedule, enable faster propagation of information on the graph, resulting in faster convergence (approximately two times faster). Moreover, serial schedule can be efficiently implemented with a significant reduction of memory requirements. This can be achieved by using the Qv messages and the Rcv messages in order to compute the Qvc messages on the fly, thus avoiding the need to use an additional memory for storing the Qvc messages. This is done by expressing Qvc as (Qv−Rcv) based on equations (2.1) and (2.3). Furthermore, the same memory as is initialized with the a-priori messages Pv is used for storing the iteratively updated Qv a-posteriori messages. An additional reduction in memory requirements is obtained because in the serial schedule we only need to use the knowledge of N(c) ∀cεC, while in the standard implementation of the flooding schedule we use both data structures N(c) ∀cεC and N(v) ∀vεV requiring twice as much memory for storing the code's graph structure. This serially schedule decoding algorithm is shown in FIG. 3.
To summarize, serial decoding schedules have several advantages:    1) Serial decoding schedules speed up the convergence speed by a factor of two compared to the standard flooding schedule. This means that we need only half the decoder logic in order to provide a given error correction capability at a given throughput, compared to a decoder based on the flooding schedule.    2) Serial decoding schedules provide a memory efficient implementation of the decoder. A RAM for storing only |V|+|E| messages is needed (instead of storing 2|V|+2|E| messages as in the standard flooding schedule). Half the ROM size for storing the code's graph structure is needed compared to the standard flooding schedule.    3) “On-the-fly” convergence testing can be implemented as part of the computations done during iteration, allowing convergence detection during iteration and decoding termination at any point. This can save on decoding time and energy consumption.
The basic decoder architecture and data path for implementing a serial message passing decoding algorithm is shown in FIG. 4. This architecture includes:
1) Q-RAM: a memory for storing the iteratively updated Qv messages (initialized as Pv messages).
2) R-RAM: a memory for storing the Rcv edge messages.
3) processing units for implementing the computations involved in updating the messages.
4) a routing layer responsible for routing messages from memories to processing units and vice versa.
5) memory for storing the code's graph structure, responsible for memory addressing and for controlling the routing layer's switching.
Iterative coding systems exhibit an undesired effect called error floor, as shown in FIG. 5, in which the Bit Error Rate (BER) at the output of the decoder starts to decrease much more slowly as the “noise”, in the communication channel or the memory device that is responsible for the bit errors, becomes smaller. This effect is problematic, especially in memory systems, in which the required decoder output bit error rate should be very small (˜1e−15). Note that in FIG. 5 the noise increases to the right.
It is well known that the error correction capability and the error floor of an iterative coding system improve as the code length increases (this is true for any ECC system, but especially for iterative coding systems, in which the error correction capability is rather poor at short code lengths). Unfortunately, in iterative coding systems, the memory complexity of the decoding hardware is proportional to the code length; hence using long codes incurs the penalty of high complexity.
Sub-optimal decoding algorithms can be used in order to reduce the decoder complexity, both in terms of memory requirements and logic complexity. However, as a result, the decoder exhibits reduced error correction capability, either in the waterfall region or in the error floor region or in both regions.