This disclosure relates to data processing and storage, and more specifically, to improving the ability of a data storage system to efficiently perform page retirement.
In certain data storage systems, data is stored in multiple storage devices. For example, in some such systems, multiple individual hard disks or memory chips are used to store data, and the data stored in one or more of the storage devices is associated with data stored in other storage devices in such a manner that data errors in one or more storage devices can be detected and possibly corrected. One such approach is to store a given quantity of data across multiple storage locations by dividing the quantity of data into portions of equal size—the individual portions sometimes being referred to as “data pages”—and then storing the data pages in multiple storage locations such that one data page is stored in each of multiple storage devices. In connection with this approach, a further storage device may be used to store a page of data protection information, where a given page of data protection information is associated with a specific set of data pages stored in the multiple storage locations. In some instances, the set of data pages in the multiple locations that is used to store associated data is referred to as a “data stripe” or “page stripe.”
In addition to the data protection information for each data stripe, individual data pages may also be protected by an error correcting code (ECC) that may be utilized to detect errors and to correct some number of errors within the page. ECC protection is provided on a certain code word whose size is often referred to as the codeword (or block) length. There may be multiple codewords within a data page. At some number or occurrence of errors, the data storage system may determine to withdraw from use (retire) portions of the data storage that are the source of errors. In data storage systems employing NAND flash memory, a data page is the smallest granule of storage that can be accessed by read and write operations, and a block (or erase block), which contains many pages, is the smallest granule of storage that can be erased. Consequently, it is conventional for a data storage system to retire an entire erase block from use in response to an ECC failure of even a codeword on a single page within the erase block.
The present disclosure recognizes that this conventional block retirement policy is over-inclusive and can unnecessarily shorten the life of a NAND flash storage device because the NAND flash storage device will itself be retired when a threshold number of its erase blocks are retired.