When data is transmitted over a transmission medium to a receiving location, the data may be received in error due to a myriad of causes. Amongst the causes of error are random noise in the channel, bursts of strong noise, drift and nonlinearities in the demodulator. or the like. A broad field of mathematics, which has been embodied in at least one of hardware, firmware, or software has arisen in an effort to overcome the errors introduced in the transmission of data, so that the received data as corrected by an error correcting code (ECC) does not differ from the transmitted data, at least with a certain probability. Such codes are characterized by a code rate, being the ratio of its information content to the overall size of the codeword. For example, for a codeword that contains k data bits and r redundancy bits, that rate is defined by k(k+r).
As first theoretically shown by Shannon, and applied and extended by others, a channel may be characterized by a maximum information rate that may be related to the energy-to-noise ratio of the received signal. Practical error correcting code types do not achieve the theoretical performance predicted by Shannon; however, recent work with low density parity codes (LDPC) shows substantial improvement over Bose-Chaudhuri-Hocquenghem (BCH) and similar cyclic codes. Yet the selection of a coding scheme for a channel is determined not only by the theoretical properties of the code, but of the complexity in computation time, instruction code space, memory or hardware needed to implement the selected coding method as well as the model of the channel itself. This remains true whether the operations are performed in, for example, a FPGA, DSP or a general purpose processor, although the detailed considerations may be somewhat different.
Data storage has been modeled by considering the storage medium to be the conceptual equivalent of the transmission channel, and coding schemes designed for communications channels has been adapted to provide error correcting capabilities for RAM memory, magnetic disk memories, CR ROMs and the like. The behavior of a storage medium may not be characterized as having the same noise statistics as a communications channel having, for example, additive white Gaussian noise (AWGN), and other channel models may be used in order to evaluate and select the appropriate error correcting code.
An evolving form of data memory is NAND FLASH, which is now being used in large-scale data memory systems. Apart from having substantially asymmetrical write and read times, FLASH memory may exhibit technology-related error characteristics, amongst which are wear out, read-disturb, write-disturb, data decay, and the like. Read-disturb and write disturb errors may be considered to be a form of data dependent error. Wear our and data decay are forms of data retention error, and the effect of data decay, generally, is to increase the rate at which the voltage values representing the stored data decay with time. At present, although there is some published data and theoretical work on the characteristics of such FLASH memories, the situation has not stabilized, as the manufacturing technology of the various vendors is still evolving. A variety of techniques are being developed to compensate for and mitigate these characteristics, however residual errors remain.
There are two generic types of NAND FLASH, single level cell (SLC) and multi-level cell (MLC), characterized as storing one bit, or more than one bit, per memory cell, respectively. Today, the MLC FLASH products are favored as they provide the storage at a lower unit cost than the SLC FLASH products, even though the number of erase operations before wear-out is considerably greater for SLC FLASH than for MLC FLASH. Herein, the term MLC may be used to refer to any data storage format having more than one bit per memory cell.
Many other characteristics of the MLC product are less satisfactory than that of a corresponding SLC product, such as read, write and erase performance times. Here, we address only the error characteristics. There is some indication that these disadvantages of MLC are increasing as the manufacturers strive for increases in device density and lower product cost.
Apart from initial manufacturing defects, which are screened by the manufacturer, and which may result in some blocks of a FLASH memory chip not being made available to the user, all of the blocks of memory of a new FLASH memory device exhibit a very low and consistent error rate. The error correction capability under these circumstances need not be very strong. As the FLASH block is programmed (written), erased, and read, over a period of time, errors begin to occur and, after some number of erase operations or storage time, the error rate begins to markedly increase. At some error rate, the device becomes unusable, as whatever error correcting codes are used have insufficient capability to correct, or even to detect, errors.
A higher level operating system may be operable to extend the life of the device by ensuring a reasonable distribution of use across all of the blocks (wear leveling), but when the error rate exceeds some threshold, error-free recovery of the data is not possible. Before such time, a system policy may be executed that declares the FLASH to be worn out, and may migrate the data to an operable FLASH device so that the worn out device may be replaced, or at least not used. The details of such a policy may vary, but the policy generally has at least some relationship to the observed error rate for recovered data.
In another aspect, the operating parameters of a particular FLASH device may be adjusted over a lifetime so as to mitigate the wear on the device, and some manufacturers of solid state disks (SSD) using FLASH have developed procedures to perform these adjustments, sometimes working in concert with the manufacturer of the FLASH devices. While extending the lifetime of a particular version of a FLASH device using vendor specific characteristics and controls may be useful, the manufacturing processes may change, particularly with respect to feature size, and such management of FLASH parameters may lead to inconsistent results.