Current enterprise-level mass storage relies on hard drives that are typically characterized by a 3.5″ form factor, a 15,000 rpm spindle motor and a storage capacity between 73 GB and 450 GB. The mechanical design follows the traditional hard drive with a single actuator and 8 read/write heads moving across 8 surfaces. The constraints of the head/media technology limit the read/write capabilities to using only one active head at a time. All data requests that are sent to the drive are handled in a serial manner, with long delays between operations, as the actuator moves the read/write head to the required position and the media rotates to place the data under the read/write head.
A solid state memory device is attractive in an enterprise mass-storage environment. For that environment, the flash memory is a good candidate among various solid state memory devices, since it does not have the mechanical delays associated with hard drives, thereby allowing higher performance and commensurately lower cost, and better usage of power and space.
The flash memory is a form of non-volatile memory, i.e. EEPROM (electronically erasable programmable read-only memory). A memory cell in a flash memory array generally includes a transistor having a control gate and drain and source diffusion regions formed in a substrate. The transistor has a floating gate under the control gate, thus forming an electron storage device. A channel region lies under the floating gate, isolated by an insulation layer (e.g. a tunnel oxide layer) between the channel and the floating gate. The energy barrier imposed by the insulating layer against charge carriers movement into or out of the floating gate can be overcome by applying a sufficiently high electric field across the insulating layer. The charge stored in the floating gate determines the threshold voltage (Vt) of the cell, which represents the stored data of the cell. Charge stored in the floating gate causes the cell to have a higher Vt. To change the Vt of a cell to a higher or lower value, the charge stored in the floating gate is increased or decreased by applying appropriate voltages at the control gate, the drain and source diffusion regions, and the channel region. The appropriate voltages cause charge to move between one or more of these regions and through the insulation layer to the floating gate.
A single-level cell (SLC) flash memory device has a single threshold voltage Vt and can store one bit of data per cell. A memory cell in a multiple-level cell (MLC) flash memory device has multiple threshold voltages, and depending on the amount of charge stored in the floating gate, can represent more than one bit of data. Because a MLC flash memory device enables the storage of multiple data bits per cell, high density mass storage applications (such as 512 Mb and beyond) are readily achievable. In a typical four-level two-bit MLC flash memory device, the cell threshold voltage Vt can be set at any of four levels to represent data “00”, “01”, “10”, and “11”. To program the memory cell to a given level, the cell may be programmed multiple times. Before each write, a flash memory array is erased to reset every cell in the array to a default state. As a result, multiple data bits that share the same cell and their electronic states, (hence their threshold voltage Vt's), are interdependent to a point that an unexpected power interruption can generate unpredictable consequences. Variations in the electronic states of the memory cells also generate variations within ranges of threshold voltages in a real system. Table 1 below shows the electronic states and the threshold voltage ranges in a two-bit MLC.
TABLE 1Threshold voltages and bit values in a two-bit MLC memory cellVtBit 1Bit 2−4.25 V to −1.75 V11−1.75 V to 0.75 V100.75 V to 3.25 V013.25 V to 5.75 V00
In spite of the advantages of MLC over SLC, MLC flash memory devices have not traditionally been used because of certain technical constraints, among which data corruption presents one of the most severe challenges.
All flash memories have a finite number of erase-write cycles. MLC flash memory devices are more vulnerable to data corruption than SLC flash memory devices. The specified erase cycle limit for each flash memory page is typically in the order of 100,000 cycles for SLC flash memory devices and typically in the order of 10,000 cycles for MLC devices. The lower cycle limit in the MLC flash memory devices poses particular problems for data centers that operate with unpredictable data streams. The unpredictable data streams may cause “hot spots”, resulting in certain highly-used areas of memory being subject to a large number of erase cycles.
In addition, various factors in normal operation can also affect flash memory integrity, including read disturbs or program disturbs. These disturbs lead to unpredictable loss of data bits in a memory cell, as a result of the reading or writing of memory cells adjacent to the disturbed cell. Sudden data losses in MLC flash memory devices due to unexpected power interruptions require frequent data recoveries. Because some data levels require more than one write operations to achieve and because more than one bit of data share the same memory cell, a power change or a program error during a write data operation leaves the data in a wrong state. When the power returns, the memory cell can be in an erratic state. Therefore a power interruption is a major risk to the integrity of data stored in MLC flash memory devices.
Flash media typically are written in units called “pages”; each page typically includes between 2000 bytes and 8000 bytes. Flash media typically are erased in units called “blocks”. Each block typically includes between 16 and 64 pages. Pages in MLC flash memory devices are coupled into paired pages. The number of paired pages maybe two for the 2-bit MLC and may go up to 3 to 4 or higher for higher bit MLCs. The paired pages may reside in shared MLC flash memory cells. If the power failure occurs while the MLC is in the middle of an operation that changes the contents of the flash media (e.g., in the middle of writing a page of data or in the middle of erasing a block of data), the electrical states of the interrupted page or block are unpredictable after the device is powered up again. The electrical states can even be random, because some of the affected bits may already be in the states assigned to them by the operation, at the time power is interrupted. However, other bits may be lagging behind and have not yet reached their target values yet. Furthermore, some bits might be caught in intermediate states and thus be in an unreliable mode, so that reading these bits returns different results under different read operations. Therefore power losses while programming a certain page can corrupt a paired page.
In the prior arts, error correction codes (ECC) and Redundant Array of Inexpensive Disk (RAID) techniques have been used to mitigate data corruption. In one instance, data corruption is prevented by writing parity pages at a different page address. Those techniques require either additional memory or complicated error-searching and data rebuilding procedures after power returns. Such requirements or solutions make the process costly to implement and place significant strain on the processing power of a conventional flash memory controller, which generally includes only a single processor. Furthermore, if a power failure occurs during the writing of a page, the paired page data can become corrupt in a MLC flash memory device. Therefore even the conventional paired page technique is susceptible to a sudden power interruption. As a matter of fact, the severity of the possible corruption is high; in some cases, every 10th data bit can be lost. Relying on conventional ECC techniques to make a MLC flash memory system reliable would be impractical to implement.
NAND flash memory data corruption can also result from program erase cycle wear outs. Electrons are injected and removed by tunneling through thin film oxide insulators. Repeated program/erase cycles damage the oxide and reduce its effectiveness. As device dimensions (e.g., oxide film thickness) shrink, data integrity problems from device wearing out can become more severe. One factor that influences this wearing out process is the speed at which the program and erase cycles are performed. However, if the speed of programming and erasing is slowed to avoid wearing out, overall performance can be impacted significantly.
Currently, a technique exists which applies a lower sense voltage to measure the charge states of the flash memory, in order to extend the lifetime of the memory device. A flash memory device is a charge-trap device that uses sense circuits to detect if a cell contains a given charge level. However, as the device wears out, its ability to store a charge is compromised. A worn out memory device allows the stored charge on the floating gate to leak. Consequently a sense circuit will detect a reduced voltage from the device. One current recovery mechanism reduces the sense voltage that is used to determine the logic value a cell contains. However, a lower sense voltage also returns a lower detected voltage, thus resulting in an incorrect charge tracking.