A solid state drive (SSD) is a data storage device that utilizes solid-state memory to retain data in nonvolatile memory chips. NAND-based flash memories are widely used as the solid-state memory storage in SSDs due to their compactness, low power consumption, low cost, high data throughput and reliability. SSDs commonly employ several NAND-based flash memory chips and a flash controller to manage the flash memory and to transfer data between the flash memory and a host computer.
While NAND-based flash memories are reliable, they are not inherently error-free and often rely on error correction coding (ECC) to correct raw bit errors in the stored data. Various mechanisms may lead to bit errors in flash memories, including noise at the power rails, voltage threshold disturbances during the reading and/or writing of neighboring cells, retention loss due to leakage within the cells and tunneling. Error correction codes (ECC) are commonly employed in flash memories to recover stored data that is affected by such error mechanisms. In operation, ECC supplements the user data with parity bits which store enough extra information for the data to be reconstructed if one or more of the data bits are corrupted. Generally, the number of data bit errors detectable and correctable in the data increases with an increasing number of error bits in the ECC. In many memory devices, data is stored in a memory location of the memory device along with the ECC for the data. In this way, the data and the ECC may be written to the memory location in a single write memory operation and read from the memory location in a single read memory operation. ECC is typically implemented in the flash memory controller.
NAND flash memories are based on floating gate storage. In floating gate storage technologies, two logic states are achieved by altering the number of electrons within the floating gate. The difference between the two logic states (1 and 0) is on the order of few electrons and is decreasing as the floating gate storage technology advances. The decreasing number of electrons responsible for the difference between the two logic states results in an increased probability of errors in the flash memory cell requiring more error correction. The fraction of data bits that are known to be corrupted, and therefore contain incorrect data, before applying the ECC is referred to as the raw bit error rate (RBER). As a result of the advances in the floating gate storage technology, the RBER for a flash page of memory cells is increasing and at technologies with feature sizes in the 1× range (below 20 nm) is nearing the Shannon Limit of the communication channel. The increased probability of errors in the stored data results in an increase in the error code correction necessary to correct the bit errors in the flash memory.
The error rate observed after application of the ECC is referred to as the uncorrectable bit error rate (UBER). The acceptable UBER is often dependent upon the application in which the SSD is employed. In the case of price sensitive, consumer applications, which experience a relatively low number of memory accesses during the SSD product lifetime, the SSD may tolerate a higher UBER as compared to a high-end application experiencing a relatively high number of memory accesses, such as an Enterprise application.
One type of error correction coding often employed in a flash storage controller is a Bose-Chaudhuri-Hochquenghem (BCH) error correction. Typically, a target UBER for an SSD ranges between 10−15 and 10−16, and the BCH error correction capability is chosen based upon this target UBER. However, due to the increased RBER of the NAND-based flash memory technology, the BCH error correction currently employed in the art for the recovery of data errors in a NAND-based flash memory is impractical to meet the target UBER.
One of the key features of BCH error correction codes is that during code design, the designer has control over the number of symbol errors that may be correctable by the BCH decoder. As such, a BCH decoder can be designed that exhibits strong error detection and correction capabilities to meet the target UBER. However, there is an upper limit to the number of errors that are detectable and correctable by a BCH error correction code.
Another type of error correction coding that may be employed in a flash storage controller is a low-density parity-check (LDPC) error correction coding. An LDPC code is a linear error correcting code having a parity check matrix with a small number of nonzero elements in each row and column. LDPC codes are capacity-approaching codes that allow the noise threshold to be set very close to the Shannon limit for a symmetric, memory-less channel. The noise threshold defines an upper bound for the channel noise, up to which the probability of lost information can be made as small as desired. LDPC error correction is superior to BCH error correction, with LDPC codes being capable of producing a UBER that is very near the Shannon limit with a lower code rate than is required using BCH error correction. However, LDPC codes may exhibit an error floor that limits the performance of the LDPC error correction. While it is known that the UBER steadily decreases as the signal-to-noise ratio condition of the channel improves, for LDPC codes there exists a point after which the rate of decrease in the UBER flattens. This region is commonly referred to as the error floor region for LDPC error correction. To guarantee a target UBER of between 10−15 and 10−16 with LDPC error correction, it is necessary to know the value of the error floor. The error floor for LDPC cannot be mathematically determined and simulation is necessary to identify the value of the error floor. However, with modern technology, it is not possible to simulate up to 10−16 to identify the value of the error floor, and as such, a target UBER of 10−16 cannot be guaranteed with LDPC error correction.
Various methods for decoding data encoded with LDPC error correction codes are known in the art. Two general LDPC decoding methods known in the art are soft-decision decoding and hard-decision decoding. Soft-decision decoding algorithms, such as the sum-product algorithm (SPA) and min-sum algorithm (MSA) are iterative and are based on belief propagation. The sum-product algorithm is known to achieve the best decoding performance, but it is computationally complex. The computational complexity of the SPA necessitates a decoding device having a large number of logic gates, resulting in an increased cost and decreased power efficiency of the device. The min-sum algorithm (MSA) is less complex than the SPA, but exhibits a noticeable degradation in the decoding performance compared to SPA. Hard-decision decoding is a less complex decoding algorithm for LDPC codes, however, its simplicity results in a significant performance loss compared to soft-decision decoding solutions. Hard-decision decoding algorithms for LDPC codes known in the art include the bit-flipping (BF) algorithm. While soft-decision decoding typically outperforms hard-decision decoding, soft-decision decoding requires multiple reads from the flash storage, thereby greatly increasing the bandwidth necessary to perform the decoding of the LDPC encoded data.
Both hard-decision and soft-decision LDPC codes may suffer from poor detection capabilities. It follows that an LDPC decoder may conclude that the encoded data has been successfully decoded, when in reality, errors still exist in the data that need to be corrected to successfully decode the encoded data.
Additionally, the RBER of the flash memory device may increase as the flash memory ages. As the RBER increases over time, more correction of the encoded data is required.
Accordingly, what is needed in the art is an improved flash controller that is capable of meeting the target UBER for a nonvolatile memory storage system over the lifetime of the flash memory device.