Solid State Drives (SSDs), also known as Solid State Disks, typically include a flash memory controller and associated non-volatile memory devices, such as NAND flash memories. An individual memory may store a single bit of information (a “0” or a “1”) but high density memories may be multi-level memories in which the stored charge may have more than two possible levels. Error checking and correction techniques may also be included.
Due to the increasing bit density of NAND flash memories and the associated smaller process geometries, there has been greater emphasis on improving the error correction capability provided by NAND flash memory controllers. Error correction is necessary due to the nature of the technology where reliability and endurance become increasing problems as flash density increases.
NAND flash memory technology depends on the storage of a trapped charge on a floating gate of a transistor which comprises the memory cell. The amount of charge which is stored will vary the threshold voltage, VT, which is the voltage when applied to a separate control gate which will cause the transistor to conduct. In order to read the memory cell, a voltage is applied to the control gate and the current which flows between the source and drain is measured. The amount of current will vary according to the charge stored on the floating gate.
Originally, flash memory cells were designed to store only a single bit, where the cell was either programmed to store a charge, or left un-programmed with no charge stored. The threshold voltage when a charge was stored would be much higher than if it were not. In order to distinguish between the two states, a voltage would be applied which was in between the two threshold voltages. If the transistor conducted, it could be assumed that no charge was stored (as the voltage applied would be above the threshold voltage of the un-programmed cell). If, however, the transistor did not conduct, then it could be assumed that a charge was stored (as the voltage applied would be below the threshold voltage of the programmed cell).
However, the mechanism for programming a stored charge is relatively imprecise. In an array of memory cells, there may be variations in cell parameters due to the position or layout of the cells in the array. Also, process variations in the manufacture of the silicon slices to make the flash memory dies may cause variations between dies used in different devices or between multiple dies on the same device. The result of this would be that the stored charge could lie anywhere on a distribution curve, which is often approximated by a normal or Gaussian distribution due to these variations.
Similarly, the mechanism for erasing a stored charge is subject to variation, where a cell that was previously programmed and then erased, may still hold some variable amount of residual charge. Erasing flash cells is conducted in bulk, with a whole block of memory cells erased at a time. Further, with repeated erasure and re-programming, flash cells deteriorate over time and exhibit increased propensity to cell variations, until finally the cells may fail completely.
The stored charge may also be subject to modification due to effects such as leakage of charge over time due to imperfections in the insulating or semiconductor layers comprising the cell, or there may be additional charge variations due to so-called ‘disturb’ effects where adjacent cells being programmed or read may result in addition or leakage of charge to/from surrounding adjacent cells due to parasitic capacitance coupling and other effects.
Hence, there are many statistical and random effects upon a cell, which, while notionally initially ‘programmed’ to a certain charge level, might subsequently hold a charge that was lower than the voltage chosen to distinguish between the charge states, appearing on reading to be a cell that was not programmed. In effect a read error would occur. Equally, a cell that was not programmed might accumulate sufficient charge due to statistical and random effects that makes the cell appear on reading to be programmed, causing a read error in the opposite direction.
This problem is compounded by the trend to move from storing a single bit per cell in SLC (single level cell) memory towards storing 2 or 3 bits per cell in MLC (multi level cell) and TLC (triple level cell). With MLC and TLC, a single cell is still used to store a charge, but as the terms suggest, multiple levels of charge are defined to represent multiple bit states. Where two bits per cell are used, 4 levels of charge are defined, including the erased or non-charged state. Where three bits per cell, 8 levels of charge are defined. When more levels are squeezed in to the same range of charge state, the charge levels and corresponding threshold voltages become closer together. This means that closer tolerances are required in order to distinguish between the different cell charge distributions used to represent the bit states, and it also means that smaller amounts of charge injection or leakage will more easily result in movement of the stored charge from the original programmed level to adjacent levels. The net result is that with multiple bits per cell, read errors become more prevalent.
A flash memory is generally organized in units of pages which are the smallest unit which are individually programmable. A block, which is the smallest unit which can be erased, is composed of multiple pages. A page of memory is provided with a spare area, which is used for the extra bits required for ECC, as well as other functions such as bits for keeping track of wear leveling and other metadata. The spare area was originally sized to be large enough to accommodate enough bits to provide for BCH (Bose Chaudhuri Hocqenghem) type codes for error correction given the expected error rates of memories at the time. BCH error correction codes are extensively used to correct read errors in NAND flash memories, primarily because they have the property that they can be flexibly designed to correct a precise number of errors in a block of data (meaning that a data block of a given size and expected error rate can be exactly reconstructed with certainty), wherever and however they may occur (i.e. randomly distributed, in fixed patterns or in bursts). They are also relatively simple to implement decoders (usually the most complex part of an ECC codec) using the syndrome decoding algebraic method. As such, BCH codes could be specifically designed to work with a given flash memory data page and spare area size. However, the greater requirements placed on the ability to cope with greater error rates in more dense NAND flash memories, along with greater requirements for longer memory endurance in enterprise computing applications as opposed to consumer applications, has meant that BCH codes have become incapable of being economically or feasibly scaled to meet the new requirements.
As a result, Low Density Parity Codes (LDPC) codes are now commonly used. LDPC codes provide greater coding efficiency than BCH (in terms of the number of bits in data block which are in error, compared with the number of extra bits needed to form the codewords from the data block). However they suffer the disadvantage that decoding is a more complex and iterative process which may not converge to an exact answer. Their success at converging on a solution can be improved by providing additional probability information regarding the likelihood or belief about which bits are in error. With BCH codes, the result of a single read operation of a page memory cells using a single sensing threshold voltage is sufficient to operate the decoding operation. Either each bit is returned correctly, or it is in error, no information is provided about where the actual value of stored charge may lie on the possible Gaussian distribution curve. This is termed ‘hard-decision’ memory sensing. Alternative improved schemes have been designed which involve performing multiple read operations using different threshold sensing voltages. The results from these multiple read operations can then be used to provide additional ‘soft information’ which can indicate approximately where on the Gaussian distribution curve the cell charge may lie. This method is termed ‘soft-decision’ memory sensing. However, this method results in a much slower overall read operation, with much increased read latency considerably reducing the read I/O bandwidth. It may also only start to offer advantages as the memory ages or data retention time increases, where the cell charge moves further away from the centre of the Gaussian distribution curve and starts to enter the overlap area of the adjacent charge level distribution curves. In this case, the reduction in memory read I/O performance as the device ages may be an acceptable tradeoff in return for extending the error correction capability.
Therefore, LDPC decoding is generally conducted using hard-decision decoding in the early lifetime of the flash memory, as this offers reasonable decoder error correction capability with no loss in performance due to increased read latency. As the flash memory ages and the error rates increase, the decoding capability is increased if soft-decision decoding is employed as more information is provided to the decoder as to the likelihood of which bits may be in error, but at the expense of increased read latency and reduced read performance.
With BCH codes, as long as the number of errors in the memory page (including the extra error protection bits in the spare area) does not exceed the correction capability of the code, the original data is guaranteed to be decodable. With LDPC, this is no longer the case, and the iterative decoding process may not converge on a solution. In particular, this may happen even if there are only a low number of errors in the page, which is more likely to happen early in the life of a NAND flash memory when error rates are low. If the decoding does not converge on a solution, this means that no information can be discerned with any certainty about any of the bits in the whole page which may be in error, effectively resulting in the whole page being rejected and a page read error being returned, which is a major failure in the read process. This may happen early in the life of the memory, where it would be expected that low rates of error can be corrected easily. It is only when the memory ages or data is retained for long periods that error rates rise to such an extent that the error correction cannot cope.
In soft-decision decoding, the resulting page error rate is very dependent on the quality (accuracy) of the soft information. While multiple reads do provide soft information in terms of the likelihood of the bit being read being a ‘0’ or a ‘1’, it applies only to that instance, where the bit has either been written as a ‘0’ or a ‘1’. However, it is known, for example, that bit positions in flash pages may have asymmetrical error properties, where the likelihood of a ‘0’ written to the flash turning into a ‘1’ is very different from the likelihood of a ‘1’ becoming a ‘0’. Also, error properties may vary between odd and even flash pages and the position of a page within a flash data block.
Alternative non-volatile memory technologies are already in use, including Phase Change Memory (PCM), Magneto-resistive RAM (MRAM), Spin Torque Transfer MRAM (STT-MRAM) Ferro-electric RAM (FeRAM or FRAM). Although based on different technologies, these memories also suffer from memory cell corruption and data read errors, where the use of LDPC using soft-decision decoding may be appropriate as an error correction technique.
Therefore, what is needed is a method to improve upon the quality of the soft information concerning error likelihoods based on multiple reads of a non-volatile memory, by collecting statistics on actual error properties of memory pages which have been corrected in the past to adjust error properties of future memory page reads.