Non-volatile memory devices such as Solid State Drives (SSDs) are finding new applications in consumer electronics. For example, they are replacing Hard Disk Drives (HDDs), which typically comprise rapidly rotating disks (platters). Non-volatile memories, sometimes referred to as ‘flash memories’ or ‘flash memory devices’ (for example, NAND and NOR flash memory devices), are used in media storage, cameras, mobile phones, mobile computers, laptop computers, USB flash drives, etc. Non-volatile memory can provide a relatively reliable, compact, cost-effective, and easily accessible method of storing data when the power is off.
NAND flash devices are generally made up of blocks comprising a number of pages. Each page can comprise multiple NAND flash cells, e.g., hundreds or thousands. A NAND flash cell may be a single level cell (SLC) that can represent one bit per cell, or a multi-level cell (MLC) that can represent two or more bits per cell. Each cell can hold a voltage to indicate a value stored in physically identical flash cells. For example, the SLC can store two values “1” or a “0” using the single bit. The MLC can store four values “10”, “01”, “11” and “00” using the two bits.
NAND flash devices are not always manufactured perfectly. For example, some blocks may have physical defects which can become worse with time. Some blocks can be screened out at factory and some blocks can be marked as factory bad blocks. In some instances, a few bad blocks may be developed during usage. The bad blocks developed during usage are called developed bad blocks and may fail much earlier than normal blocks. Bad blocks can develop erratically and may be hard to predict. For example, in some instances, elevated heat or multiple program-erase (P/E) cycles can make the NAND flash devices susceptible to bit errors, thus causing them likely to fail.
Use of error correction codes (ECC), e.g., Hamming codes, parity, cyclic codes, etc., to detect and correct bit errors in NAND flash devices is known. A bit error rate (BER) or an error rate may be defined by a percentage of bits with errors to a total number of bits. In most instances, ECC can minimize possible errors and can help extend the life of the flash devices. However, in some instances, not all errors can be corrected. Under some approaches, SSD technologies can use “chipkill” method to rescue the data in case of an Uncorrectable by Error Correction Code (UECC) failure by other methods. For example, support of the chipkill feature may require an additional NAND flash device similar to the RAID (redundant array of inexpensive disks) solution, which can be used to recover the data upon failure of certain blocks. ChipKill can be a costly feature in flash devices as it can affect size of system data area, cost of development (hardware/firmware), Application Specific Integrated Circuit (ASIC) layout real estate, and power. Technology that can remove the need for chipkill can be beneficial.
Data have shown that most of the catastrophic failures taking place in the developed bad blocks are localized on a single or two neighboring word-lines (WLs), and they are caused by gradually developed defects. Background media scan (BGMS) can be performed to periodically monitor the reliability of the data written to the flash memory during idle/free time. It can detect performance degradation on NAND superblocks due to retention, and refresh (garbage collect) the superblock. If the refresh cannot improve the performance, the block can be retired when necessary. Under one approach, the BGMS can catch the worst WL bit error rate (BER) and compare with a pre-determined threshold. However, such an approach cannot detect the sign of a developed bad block and/or predict its failure.
According to some embodiments disclosed, an improved BGMS method can be used to detect the developed bad blocks during flash memory usage, with the hope of reducing or eliminating the need for “chipkill” in the flash memory devices.