The present invention relates to the protection of data from corruption and, more particularly, to a method and related systems that use multiple error correction schemes to protect data from corruption.
Flash memory devices have been known for many years. NAND-type flash memories differ from other types of flash memories (e.g. NOR-type), among other characteristics, by the fact that a certain number of information bits, written to the memory, may be read from the memory in a “flipped” state (ie. different from the state that the original bits were written to the memory).
In order to overcome this phenomenon and to make NAND-type memories usable by real applications, it is a common technique to use Error Correction Codes (ECC) in conjunction with these memories. A general overview of using ECC in flash memories is presented below and includes the following steps:                (1) Before writing data to the memory, an ECC algorithm is applied to the data in order to compute additional (i.e. redundant) bits, which are later used for error detection and correction. These redundant bits are often called “parity bits” or “parity”. A combination of the data input into an ECC module and the parity output by that module is called a codeword. Each different value of input data to an ECC module results in a different codeword.        (2) The entire codeword (i.e., the original data and the parity) is recorded to the flash memory. It should be noted, that the actual size of NAND-type flash memory is larger than the size of the original data, and the memory is designed to accommodate parity as well.        (3) When the data are retrieved from the memory, the entire codeword is read again, and an ECC algorithm is applied to the data and the parity in order to detect and correct possible “bit flips” (i.e., errors).        
It should be noted that the implementation of ECC may be done by hardware, software, or a combination of hardware and software. Furthermore, ECC may be implemented within a memory device, within a memory device controller, within a host computer, or may be “distributed” among these components of a system.
The design of ECC algorithms is well known in the art. The algorithms in common use include Reed-Solomon, BCH, Hamming, and many others. Each ECC algorithm is composed of two parts—the part that receives the data bits and generates the parity bits (or equivalently, generates the codeword), and the part that receives the codeword and generates the corrected data bits. The first part is called the “encoder” and is used during writing, and the second part is called the “decoder” and is used during reading. Each of the two parts may be implemented in either hardware or software, and it is also possible to have one part implemented in hardware while the other part implemented in software.
Receiving the data bits and generating the corresponding codeword is termed “encoding” herein. Receiving the codeword and generating the corrected data bits is termed “decoding” herein.
It should be noted that there actually are two kinds of ECC. The kind of ECC described above, in which the identity of the data bits is preserved in the codeword, is called “systematic” ECC. In “nonsystematic” ECC, the data bits are converted to a codeword in which the identity of the original data bits is not preserved.
Selecting an algorithm, like BCH, as the ECC algorithm to be used in a flash memory system, does not uniquely define the selected solution. Any such ECC algorithm is actually not a single algorithm but a family of algorithms. The algorithms within the same family differ among themselves in the amount of data bits they are able to protect. An algorithm that needs to protect 100 data bits is not identical to an algorithm that needs to protect 10,000 data bits, even though the two algorithms are typically quite similar and operate on the same principles.
But even two algorithms of the same family that both protect the same number of data bits are not necessarily identical. The algorithms may differ in the level of reliability provided, or equivalently—in the number of bit errors in the data that the algorithms are able to correct. For example, one system may require the protection of chunks of 1,000 data bits against any combination of up to 3 bit errors (but not against the occurrence of 4 or more bit errors), while in another system a much higher reliability is desired and therefore it is required to protect chunks of 1,000 data bits against any combination of up to 10 bit errors. Typically, protecting against more errors requires the use of more parity bits (or longer codewords), making the ECC scheme less “efficient”, where efficiency is measured by the ratio of the number of data bits in a codeword to the total number of bits in the codeword (including, in systematic ECC, both data bits and parity bits). This measure is typically called the “rate” of the ECC coding.
Different ECC algorithms and implementations also differ in other aspects—speed of the encoding process, speed of the decoding process, complexity of the encoding process, complexity of the decoding process, acceptable error rate in the input to the decoder (defined according to the quality of the storage cells), and more. The complexity of encoding and decoding is important not only because it affects the speed of the operation, but also because it affects the power consumption and silicon area of hardware implementations of the ECC scheme.
It is thus evident that the selection of an ECC solution for a memory system involves a complex trade-off between multiple considerations. Some non-limiting rules-of-thumb typical in the art of ECC designs are:    a. For a given memory reliability, the better the output reliability (or equivalently the higher the number of correctable errors) the lower the rate of the code (or equivalently, for systematic ECC, the more parity bits are required)    b. For a given memory reliability, the better the output reliability, the more complex is the decoder.    c. For a given level of output reliability, the higher the rate of the code, the more complex is the decoder.    d. For a given level of output reliability, the higher the rate of the code, the slower is the decoding.
When designing an ECC solution, one typically starts from the error rate at the decoder's input (dictated by the quality of the storage cells) and the desired output reliability (dictated by the application's requirements). Based on these numbers one typically selects a specific ECC family, calculates the required number of parity bits, and then estimates the speed and complexity of the encoder and decoder.
In some cases the most important consideration for the system's designer is the speed of the decoding, as this may put a limit on the speed of reading the data out from the memory. In such cases the designer may encounter a dilemma—the ECC scheme required for meeting the output reliability requirements may turn out to result in a quite complex decoder with slow operation, not satisfying the speed target of the system. But on the other hand, selecting an ECC scheme that is relatively simple, and that results in fast decoding, does not provide the required output reliability level.
There is thus a widely recognized need for, and it would be highly advantageous to have, an error correction solution that satisfies both requirements (i.e. both speed and reliability) at the same time, even when there is no ECC scheme known in the art that achieves this goal.