This disclosure relates to data processing and data storage, and more specifically, to mitigating asymmetric transient errors in a non-volatile memory system by proactive data relocation.
NAND flash memory is an electrically programmable and erasable non-volatile memory technology that stores one or more bits of data per memory cell as a charge on the floating gate of a transistor or a similar charge trap structure. The amount of charge on the floating gate modulates the threshold voltage of the transistor. By applying a proper read voltage and measuring the amount of current, the programmed threshold voltage of the memory cell can be determined and thus the stored information can be detected. Memories storing one, two, three and four bits per cell are respectively referred to in the art as Single Level Cell (SLC), Multi-Level Cell (MLC), Three Level Cell (TLC), and Quad Level Cell (QLC) memories.
In a typical implementation, a NAND flash memory array is organized in physical blocks (also referred to as “erase blocks”) of physical memory, each of which includes multiple physical pages each in turn containing a multiplicity of memory cells. By virtue of the arrangement of the word and bit lines utilized to access memory cells, flash memory arrays have generally been programmed on a physical page basis, but erased on a physical block basis. Blocks must be erased prior to being programmed.
As is well known in the art, NAND flash memory is inherently susceptible to bit errors, including error caused by program disturb effects, over-programming effects, read disturb effects, data retention (i.e., errors attributable to decay of the gate charge of programmed cells over time), and wear (i.e., errors attributable to damage to the gate dielectric due to the number of cell program/erase (PE) cycles to which the cell is subjected). In general, the bit error rate (BER) attributable to wear is permanent and increases monotonically over the life of a NAND flash memory. Similarly, program disturb and over-programming effects can be viewed as permanent; even though they disappear after an erase operation, these two types of effects influence the BER already directly after the pages are programmed. Errors, such as those caused by read disturbs, and data retention, are more transient and, although generally increasing over time, disappear by erasure of the affected blocks. After a page is programmed, these transient effects begin to reappear gradually with increasing time and an increasing number of reads.
Data storage systems employing flash memory as a storage media generally implement flash management functions to mitigate these inherent error characteristics of flash memory. Existing systems commonly integrate at least some of these flash management functions into the data path (e.g., error correcting code (ECC) encoding and RAID-like data protection schemes), while other flash management functions operate in the background independently of any external requests for the data stored in the flash memory. Examples of background flash management functions common in enterprise-class flash arrays include read sweeping, which entails reading individual flash pages to detect bit errors, wear leveling, which seeks to equalize the program/erase cycle counts for all flash pages, and block calibration, which determines appropriate read threshold voltages.