The present invention relates to error correction of digital data and, more particularly, to a method of error correction for flash memory devices that store multiple bits per cell.
Flash memory devices have been known for many years. Typically, each cell within a flash memory stores one bit of information. Traditionally, the way to store a bit has been by supporting two states of the cell—one state represents a logical “0” and the other state represents a logical “1”. In a flash memory cell the two states are implemented by having a floating gate above the cell's channel (the area connecting the source and drain elements of the cell's transistor), and having two valid states for the amount of charge stored within this floating gate. Typically, one state is with zero charge in the floating gate and is the initial unwritten state of the cell after being erased (commonly defined to represent the “1” state) and another state is with some amount of negative charge in the floating gate (commonly defined to represent the “0” state). Having negative charge in the gate causes the threshold voltage of the cell's transistor (i.e. the voltage that has to be applied to the transistor's control gate in order to cause the transistor to conduct) to increase. Now it is possible to read the stored bit by checking the threshold voltage of the cell: if the threshold voltage is in the higher state then the bit value is “0” and if the threshold voltage is in the lower state then the bit value is “1”. Actually there is no need to accurately read the cell's threshold voltage. All that is needed is to correctly identify in which of the two states the cell is currently located. For that purpose it is enough to make a comparison against a reference voltage value that is in the middle between the two states, and thus to determine if the cell's threshold voltage is below or above this reference value.
FIG. 1A shows graphically how this works. Specifically, FIG. 1A shows the distribution of the threshold voltages of a large population of cells. Because the cells in a flash memory are not exactly identical in their characteristics and behavior (due, for example, to small variations in impurities concentrations or to defects in the silicon structure), applying the same programming operation to all the cells does not cause all of the cells to have exactly the same threshold voltage. (Note that, for historical reasons, writing data to a flash memory is commonly referred to as “programming” the flash memory.) Instead, the threshold voltage is distributed similar to the way shown in FIG. 1A. Cells storing a value of “1” typically have a negative threshold voltage, such that most of the cells have a threshold voltage close to the value shown by the left peak of FIG. 1A, with some smaller numbers of cells having lower or higher threshold voltages. Similarly, cells storing a value of “0” typically have a positive threshold voltage, such that most of the cells have a threshold voltage close to the value shown by the right peak of FIG. 1A, with some smaller numbers of cells having lower or higher threshold voltages.
In recent years a new kind of flash memory has appeared on the market, using a technique conventionally called “Multi Level Cells” or MLC for short. (This nomenclature is misleading, because the previous type of flash cells also have more than one level: they have two levels, as described above. Therefore, the two kinds of flash cells are referred to herein as “Single Bit Cells” (SBC) and “Multi-Bit Cells” (MBC).) The improvement brought by the MBC flash is the storing of two or more bits in each cell. In order for a single cell to store two bits of information the cell must be able to be in one of four different states. As the cell's “state” is represented by its threshold voltage, it is clear that a 2-bit MBC cell should support four different valid ranges for its threshold voltage. FIG. 1B shows the threshold voltage distribution for a typical 2-bit MBC cell. As expected, FIG. 1B has four peaks, each corresponding to one state. As for the SBC case, each state is actually a range and not a single number. When reading the cell's contents, all that must be guaranteed is that the range that the cell's threshold voltage is in is correctly identified. For a prior art example of an MBC flash memory see U.S. Pat. No. 5,434,825 to Harari.
Similarly, in order for a single cell to store three bits of information the cell must be able to be in one of eight different states. So a 3-bit MBC cell should support eight different valid ranges for its threshold voltage. FIG. 1C shows the threshold voltage distribution for a typical 3-bit MBC cell. As expected, FIG. 1C has eight peaks, each corresponding to one state. FIG. 1D shows the threshold voltage distribution for a 4-bit MBC cell, for which sixteen states, represented by sixteen threshold voltage ranges, are required.
When encoding two bits in an MBC cell via the four states, it is common to have the left-most state in FIG. 1B (typically having a negative threshold voltage) represent the case of both bits having a value of “1”. (In the discussion below the following notation is used—the two bits of a cell are called the “lower bit” and the “upper bit”. An explicit value of the bits is written in the form [“upper bit” “lower bit”], with the lower bit value on the right. So the case of the lower bit being “0” and the upper bit being “1” is written as “10”. One must understand that the selection of this terminology and notation is arbitrary, and other names and encodings are possible). Using this notation, the left-most state represents the case of “11”. The other three states are typically assigned by the following order from left to right: “10”, “00”, “01”. One can see an example of an implementation of an MBC NAND flash memory using this encoding in U.S. Pat. No. 6,522,580 to Chen, which patent is incorporated by reference for all purposes as if fully set forth herein. See in particular FIG. 8 of the Chen patent. U.S. Pat. No. 6,643,188 to Tanaka also shows a similar implementation of an MBC NAND flash memory, but see FIG. 7 there for a different assignment of the states to bit encodings: “11”, “10”, “01”, “00”. The Chen encoding is the one illustrated in FIG. 1B.
We extend the above terminology and notation to the cases of more than two bits per cell, as follows. The left-most unwritten state represents “all ones” (“1 . . . 1”), the string “1 . . . 10” represents the case of only the lowest bit of the cell being written to “0”, and the string “01 . . . 1” represents the case of only the most upper bit of the cell being written to “0”.
When reading an MBC cell's content, the range that the cell's threshold voltage is in must be identified correctly; only in this case this cannot always be achieved by comparing to only one reference voltage. Instead, several comparisons may be necessary. For example, in the case illustrated in FIG. 1B, to read the lower bit, the cell's threshold voltage first is compared to a reference comparison voltage V1 and then, depending on the outcome of the comparison, to either a zero reference comparison voltage or a reference comparison voltage V2. Alternatively, the lower bit is read by unconditionally comparing the threshold voltage to both a zero reference voltage and a reference comparison voltage V2, again requiring two comparisons. For more than two bits per cell, even more comparisons might be required.
Denote a page in the flash memory as the smallest portion of data that can be separately written into the flash memory, then the bits of a single MBC cell may all belong to the same flash page, or these bits may be assigned to different pages so that, for example in a 4-bit per cell flash memory, the lowest bit is in page 0, the next bit is in page 1, the next bit in page 2, and the highest bit is in page 3.
MBC devices provide a significant cost advantage. An MBC device with two bits per cell requires about half the area of a silicon wafer required by an SBC of similar capacity. However, there are drawbacks to using MBC flash. Average read and write times of MBC memories are longer than of SBC memories, resulting in reduced performance. More importantly, the reliability of MBC is lower than SBC. The difference between the threshold voltage ranges in MBC are much smaller than in SBC. Thus, a disturbance in the threshold voltage (e.g. leakage of stored charge causing a threshold voltage drift or interference from operating neighboring cells), that are insignificant in SBC because of the large gap between the two states, may cause an MBC cell to move from one state to another, resulting in an erroneous bit. The end result is a lower performance specification of MBC cells in terms of data retention time or in terms of the endurance of the device to many write/erase cycles.
Flash memory cells, and especially flash memory cells of the NAND-type, have a non-zero probability of providing erroneous bits when read out. In other words—there is a non-zero (even though small) probability that when writing a specific bit of data into the flash memory device and later reading the bit out of the device, the read value of the bit will not be equal to the previously written value. This fact is typically explicitly stated in the datasheets of NAND-type flash memory devices, and the manufacturer usually provides a recommendation for the amount of error correction that should be applied to the data being read. For SBC flash memory devices it is typical for the manufacturer to recommend the use of an Error Correction Code (ECC) capable of correcting one bit error per each sector of 512 bytes of user data. For 2-bit-per-cell MBC flash memory devices it is typical for the manufacturer to recommend the use of an ECC capable of correcting four bit errors per each sector of 512 bytes of user data. This is in line with the previous observation that MBC cells are less reliable than SBC cells.
Error Correction Code implementations include two parts. The first part is called the “encoder” and is activated when writing the data into the memory. The encoder receives the user data as an input, and outputs a “codeword” that is a representation of the user data plus some extra information that will allow overcoming errors in the data should these errors occur. The second part is called the “decoder” and is activated when data are read from the flash memory device. The decoder receives the bits read out from the memory cells. Those bits should ideally be identical to the codeword previously stored, but in reality those bits might include erroneous bits. The decoder's task is then to use the extra information placed in the codeword by the encoder to recover the correct user bits.
ECC decoders can be classified into two types:                a. Iterative decoders        b. Non-iterative decoders        
For the purpose of the present invention, iterative decoders are defined as decoders that carry out a decoding algorithm in which a potential value of the decoded user data are generated by the algorithm and tested against a success criterion. If the success criterion is met, the potential value is made the decoded user data. If the success criterion is not met, the algorithm goes into another computation which results in a new potential value of the decoded user data, which in turn is tested against the success criterion according to the above decision logic. Non-iterative decoders are all decoders that are not iterative decoders. It should be noted that both iterative and non-iterative decoders may be implemented in hardware, in software, or in a combination of hardware and software, and all types of implementations are within the scope of the terms “iterative decoder” and “non-iterative decoder”.
Iterative decoders are typically more complex to implement than non-iterative decoders. On the other hand, the error correction capabilities of iterative decoders usually are superior to the error correction capabilities of non-iterative decoders.
As explained above, iterative decoders process information in iterations, using the output of one iteration as the input to the next iteration. To make this approach work an iterative code is typically constructed from simpler constituent codes. There are several families of codes that can be efficiently decoded by the iterative procedure. The most popular ones are Convolutional Turbo Codes (CTC), Turbo Product Codes (TPC), and Low Density Parity-Check (LDPC) codes. In CTC the constituent codes are convolutional codes, in TPC the constituent codes are simple block codes (e.g. parity-check, Hamming, two-error correcting BCH codes), and in LDPC codes the constituent codes are short parity-check and repetition codes.
For a survey of iterative schemes see S. Lin and D. J. Costello, Error Control Coding, Prentice-Hall, 2004.
Detailed description of CTC can be found in C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: Turbo-codes”, IEEE Trans. Com., Vol. 44, No. 10, pp. 1261-1271, October 1996.
TPC codes are treated in R. M. Pyndiah, “Near-optimum decoding of product codes: Block turbo codes”, IEEE Trans. Com., vol. 46, pp. 1003-1010, August 1998.
LDPC codes are described in R. G. Gallager, “Low-density parity-check codes”, IRE Trans. Info. Theory, vol. IT-8, pp. 21-28, 1962.
At the present time, iterative decoding is used only in communication, and not in data storage applications. In particular, there are no flash memory systems that employ iterative decoders for correcting errors in data read from the flash memory. This is not surprising, given the relatively high implementation costs of iterative decoders.
As of the present time there are no commercially available MBC flash memory devices with more than two bits per cell. The major obstacle preventing such devices from becoming available is the poor reliability of the data read out of the cells of these memory devices. For example, with existing flash memory manufacturing technologies, MBC cells storing four bits per cell may output very unreliable data that requires an ECC capable of correcting hundreds of bit errors.
There is thus a need to find a way of making MBC flash memory devices with more than two bits per cell useful in spite of the large number of errors that these devices introduce into the data read out of them.