This disclosure relates to data processing and storage, and more specifically, to management of a non-volatile memory system, such as a flash memory system, to support data recovery in the event of multi-page failures.
NAND flash memory is an electrically programmable and erasable non-volatile memory technology that stores one or more bits of data per memory cell as a charge on the floating gate of a transistor or a similar charge trap structure. In a typical implementation, a NAND flash memory array is organized in blocks (also referred to as “erase blocks”) of physical memory, each of which includes multiple physical pages each in turn containing a multiplicity of memory cells. By virtue of the arrangement of the word and bit lines utilized to access memory cells, flash memory arrays can generally be programmed on a page basis, but are erased on a block basis.
As is known in the art, blocks of NAND flash memory must be erased prior to being programmed with new data. A block of NAND flash memory cells is erased by applying a high positive erase voltage pulse to the p-well bulk area of the selected block and by biasing to ground all of the word lines of the memory cells to be erased. Application of the erase pulse promotes tunneling of electrons off of the floating gates of the memory cells biased to ground to give them a net positive charge and thus transition the voltage thresholds of the memory cells toward the erased state. Each erase pulse is generally followed by an erase verify operation that reads the erase block to determine whether the erase operation was successful, for example, by verifying that less than a threshold number of memory cells in the erase block have been unsuccessfully erased. In general, erase pulses continue to be applied to the erase block until the erase verify operation succeeds or until a predetermined number of erase pulses have been used (i.e., the erase pulse budget is exhausted).
A NAND flash memory cell can be programmed by applying a positive high program voltage to the word line of the memory cell to be programmed and by applying an intermediate pass voltage to the memory cells in the same string in which programming is to be inhibited. Application of the program voltage causes tunneling of electrons onto the floating gate to change its state from an initial erased state to a programmed state having a net negative charge. Following programming, the programmed page is typically read in a read verify operation to ensure that the program operation was successful, for example, by verifying that less than a threshold number of memory cells in the programmed page contain bit errors. In general, program and read verify operations are applied to the page until the read verify operation succeeds or until a predetermined number of programming pulses have been used (i.e., the program pulse budget is exhausted).
Enterprise-class data storage systems employing all flash storage media often organize data within a flash card or solid state disk (SSD) into page stripes in which physical pages of flash memory from different channels/lanes are grouped together to add data redundancy and/or optimize parallel processing of write requests. For example, a page stripe may be formed across a set of blocks of memory from physical pages having common page indices. The integrity of the data forming the page stripe may be improved by appending a parity page to the page stripe, thus implementing a parity scheme similar to RAID 5/6.
In at least some cases, a flash card is faulted when a multi-page failure occurs in the same page stripe because the selected RAID parity scheme cannot correct such an error. For example, RAID 5 can correct single page failures, but cannot correct errors in two or more pages of the same page stripe, while RAID 6 can correct double page failures, but cannot correct errors in three or more pages of the same page stripe. A typical response to the flash card being faulted is to reconstruct the entire contents of the flash card. Reconstructing the flash card not only limits performance of the data storage system while the contents of flash card are being recovered, but also exposes the flash array to the additional fatal risk of encountering another flash card failure during the reconstruction process, which can result in unrecoverable data loss.