The present invention relates to error correction of digital data and, more particularly, to a method of error correction for flash memory devices that store multiple bits per cell.
Flash memory devices have been known for many years. Typically, each cell within a flash memory stores one bit of information. Traditionally, the way to store a bit has been by supporting two states of the cell—one state represents a logical “0” and the other state represents a logical “1”. In a flash memory cell the two states are implemented by having a floating gate above the cell's channel (the area connecting the source and drain elements of the cell's transistor), and having two valid states for the amount of charge stored within this floating gate. Typically, one state is with zero charge in the floating gate and is the initial unwritten state of the cell after being erased (commonly defined to represent the “1” state) and another state is with some amount of negative charge in the floating gate (commonly defined to represent the “0” state). Having negative charge in the gate causes the threshold voltage of the cell's transistor (i.e. the voltage that has to be applied to the transistor's control gate in order to cause the transistor to conduct) to increase. Now it is possible to read the stored bit by checking the threshold voltage of the cell: if the threshold voltage is in the higher state then the bit value is “0” and if the threshold voltage is in the lower state then the bit value is “1”. Actually there is no need to accurately read the cell's threshold voltage. All that is needed is to correctly identify in which of the two states the cell is currently located. For that purpose it is enough to make a comparison against a reference voltage value that is in the middle between the two states, and thus to determine if the cell's threshold voltage is below or above this reference value.
FIG. 1A shows graphically how this works. Specifically, FIG. 1A shows the distribution of the threshold voltages of a large population of cells. Because the cells in a flash memory are not exactly identical in their characteristics and behavior (due, for example, to small variations in impurities concentrations or to defects in the silicon structure), applying the same programming operation to all the cells does not cause all of the cells to have exactly the same threshold voltage. (Note that, for historical reasons, writing data to a flash memory is commonly referred to as “programming” the flash memory.) Instead, the threshold voltage is distributed similar to the way shown in FIG. 1A. Cells storing a value of “1” typically have a negative threshold voltage, such that most of the cells have a threshold voltage close to the value shown by the left peak of FIG. 1A, with some smaller numbers of cells having lower or higher threshold voltages. Similarly, cells storing a value of “0” typically have a positive threshold voltage, such that most of the cells have a threshold voltage close to the value shown by the right peak of FIG. 1A, with some smaller numbers of cells having lower or higher threshold voltages.
In recent years a new kind of flash memory has appeared on the market, using a technique conventionally called “Multi Level Cells” or MLC for short. (This nomenclature is misleading, because the previous type of flash cells also have more than one level: they have two levels, as described above. Therefore, the two kinds of flash cells are referred to herein as “Single Bit Cells” (SBC) and “Multi-Bit Cells” (MBC).) The improvement brought by the MBC flash is the storing of two or more bits in each cell. In order for a single cell to store two bits of information the cell must be able to be in one of four different states. As the cell's “state” is represented by its threshold voltage, it is clear that a 2-bit MBC cell should support four different valid ranges for its threshold voltage. FIG. 1B shows the threshold voltage distribution for a typical 2-bit MBC cell. As expected, FIG. 1B has four peaks, each corresponding to one state. As for the SBC case, each state is actually a range and not a single number. When reading the cell's contents, all that must be guaranteed is that the range that the cell's threshold voltage is in is correctly identified. For a prior art example of an MBC flash memory see U.S. Pat. No. 5,434,825 to Harari.
Similarly, in order for a single cell to store three bits of information the cell must be able to be in one of eight different states, So a 3-bit MBC cell should support eight different valid ranges for its threshold voltage. FIG. 1C shows the threshold voltage distribution for a typical 3-bit MBC cell. As expected, FIG. 1C has eight peaks, each corresponding to one state. FIG. 1D) shows the threshold voltage distribution for a 4-bit MBC cell, for which sixteen states, represented by sixteen threshold voltage ranges, are required.
When encoding two bits in an MBC cell via the four states, it is common to have the left-most state in FIG. 1B (typically having a negative threshold voltage) represent the case of both bits having a value of “1”. (In the discussion below the following notation is used—the two bits of a cell are called the “lower bit” and the “upper bit”. An explicit value of the bits is written in the form [“upper bit” “lower bit”], with the lower bit value on the right. So the case of the lower bit being “0” and the upper bit being “1” is written as “10”. One must understand that the selection of this terminology and notation is arbitrary, and other names and encodings are possible). Using this notation, the left-most state represents the case of “11”. The other three states are typically assigned by the following order from left to right: “10”, “00”, “01”. One can see an example of an implementation of an MBC NAND flash memory using this encoding in U.S. Pat. No. 6,522,580 to Chen, which patent is incorporated by reference for all purposes as if fully set forth herein. See in particular FIG. 8 of the Chen patent. U.S. Pat. No. 6,643,188 to Tanaka also shows a similar implementation of an MBC NAND flash memory, but see FIG. 7 there for a different assignment of the states to bit encodings: “11”, “10”, “01”, “00”. The Chen encoding is the one illustrated in FIG. 1B.
We extend the above terminology and notation to the cases of more than two bits per cell, as follows. The left-most unwritten state represents “all ones” (“1 . . . 1”), the string “1 . . . 10” represents the case of only the lowest bit of the cell being written to “0”, and the string “01 . . . 1” represents the case of only the most upper bit of the cell being written to “0”.
When reading an MBC cell's content, the range that the cell's threshold voltage is in must be identified correctly; only in this case this cannot always be achieved by comparing, to only one reference voltage. Instead, several comparisons may be necessary. For example, in the case illustrated in FIG. 1B, to read the lower bit, the cell's threshold voltage first is compared to a reference comparison voltage V1 and then, depending on the outcome of the comparison, to either a zero reference comparison voltage or a reference comparison voltage V2. Alternatively, the lower bit is read by unconditionally comparing the threshold voltage to both a zero reference voltage and a reference comparison voltage V2, again requiring two comparisons. For more than two bits per cell, even more comparisons might be required.
The bits of a single MBC cell may all belong to the same flash page, or they may be assigned to different pages so that, for example in a 4-bit cell, the lowest bit is in page 0, the next bit is in page 1, the next bit in page 2, and the highest bit is in page 3. (A page is the smallest portion of data that can be separately written in a flash memory).
Lasser, U.S. patent application Ser. No. 11/035,807, deals with methods of encoding bits in flash memory cells storing multiple bits per cell. Lasser, U.S. patent application Ser. No. 11/061,634, and Murin, U.S. patent application Ser. No. 11/078,478, deal with the implications of those methods of bits encoding on the question of error distribution across different logical pages of multi-bit flash cells. Specifically, Lasser '634 teaches a method for achieving even distribution of errors across different logical pages, as seen by the user of the data and as dealt with by the Error Correction Code (ECC) circuitry, using a logical-to-physical mapping of bit encodings; and Murin teaches a method for achieving even distribution of errors across different logical pages, as seen by the user of the data and as dealt with by the ECC circuitry, using interleaving of logical pages between physical bit pages. All three of these prior art patent applications are incorporated by reference for all purposes as if fully set forth herein.
Both Lasser '634 and Murin address the same goal: reducing the error rate for which the ECC circuitry should be designed. In the example presented in both applications a group of 15,000 4-bit MBC flash memory cells is used for storing 4 logical pages of data, of 15,000 bits each. The assumed cell error rate is 1 in 1,000. The resulting optimal number of bit errors is 15, and therefore the optimal average bit errors in a logical page is 3.75. The example shows that unless the proposed innovations are used, a specific logical page might end up with a much higher bit error rate—6 bit errors in the example shown. This means that even though the overall average of bit errors across all bits stored in the cells is relatively low (15 in 60,000, or 1 in 4,000), unless special measures are taken the ECC circuitry dealing with correcting errors in a logical page must be designed to handle a relatively high average bit error rate (in that example—6 in 15,000, or 1 in 2,500).
A recent US patent application by the inventors of the present application and entitled “METHOD OF ERROR CORRECTION IN MBC FLASH MEMORY” (herein “Litsyn et al.”) discloses a different approach to the same goal. That patent application is incorporated by reference for all purposes as if fully set forth herein. Instead of dealing with each logical page separately for the purpose of error correction, Litsyn et al. deal with all logical pages sharing the same group of cells at the same time, treating all bits of all those multiple logical pages as one ECC codeword. This causes the average bit error rate which the ECC circuitry has to cope with to be lower—only 1 in 4,000 in the example above.
In most ECC implementations all bits are treated the same and no bits are considered more reliable or less reliable than the average. However, as is evident from the above, when reading multiple logical pages from a group of MBC flash memory cells, the bits stored in different bit pages have different error probabilities. Some of the prior art methods for averaging errors distribution discussed above (Lasser '634, Murin) succeed in causing all logical pages to have the same number of bit errors on average, but different individual bits still have different reliabilities.
Information about bit error rates of individual bits in a codeword that is to be error corrected is very useful for an error correction module. We shall demonstrate this using a very simplified example. Assume a group of four bits protected against a single error by a parity bit, such that if an error is detected the ECC selects one of the bits to be flipped and provides this as the correction result. If all five bits in the codeword (four data bits and one parity bit) are equally likely to be in error, then the decision as to which bit to flip upon detecting an error can only be made at random. This leads to only 20% correct decisions. But if one of the bits is known to be six times less reliable than any of the other four bits in the codeword, then selecting that bit to be flipped upon detecting an error results in 60% correct decisions. While this example is extremely simplified and in real-world ECC implementations the methods of calculation and decision taking are much more complicated, it does serve the purpose of demonstrating the usefulness of reliability data for individual bits for improving the performance of error correction schemes.
There are prior art systems in which extra reliability information affects the way ECC circuitry handles different bits. See for example U.S. patent application Ser. No. 10/867,645 to Ban et al. filed Jun. 16, 2004 and entitled “METHODS OF INCREASING THE RELIABILITY OF A FLASH MEMORY”. In Ban et al. the data stored in a flash cell are read using a higher resolution than is required for separating the state of a cell into its possible values. For example, if a cell is written into one of 16 states (i.e. the cell stores 4 bits), then the cell is read as if it had 5 bits. This is called using “fractional levels” by Ban et al. but others use different terms such as “soft bits”. Others also use more than one bit of extra reading to provide even a higher resolution. The extra bits provided by that high resolution reading are used by the ECC module for estimating reliability of other “true” data bits, as they provide evidence regarding the exact state of a cell compared to the borders separating its state (as it was actually read) from the neighboring states. A cell located near a border is more likely to be in error than a cell located in the middle of the band and far away from the borders. There are also prior art communication systems that utilize this approach, where sometimes many extra high resolution bits are used for improving the error correction performance of a channel.
In all these prior art systems, the extra reliability information is information additional to information inherent in just the stored bits themselves. Such ECC would be simplified if it could be based only on what is inherent in the stored bits themselves. For example, ECC based on extra reliability information could be implemented without reading the cells of a MBC flash memory with more resolution than is needed to read the bits stored in the cells.