Non-volatile memory systems, such as flash memory, are used in digital computing systems as a means to store data and have been widely adopted for use in consumer products. Flash memory may be found in different forms, for example in the form of a portable memory card that can be carried between host devices or as a solid state disk (SSD) embedded in a host device. These memory systems typically work with data units called “pages” that can be written, and groups of pages called “blocks” that can be read and erased, by a storage manager often residing in the memory system.
Flash storage systems generally fail when they run out of spare blocks to replace blocks that have been retired because they failed to erase. Blocks in a typical storage system tend not to fail at the same time. Rather, blocks often fail at different program/erase cycle levels depending on varying local fabrication parameters across a memory die and/or from die to die in a multi-die memory. FIG. 1 shows one hypothetical example of the frequency of erase failure as a function of cycle count. As is seen in this hypothetical example, the erase cycle at which the blocks fail varies by approximately 900 cycles (approximately 25%) between the earliest and latest failure.
A standard approach to try and extend the life of a storage system is to apply wear leveling based on keeping the number of program/erase cycles applied to each block as even as possible. Although this may avoid problems with concentrating program/erase cycles on only a few blocks, it is based on the assumption that blocks have the same lifespan (in terms of having the same number of program/erase cycles before failure) and may result in a storage system that fails based on the weakest blocks and wastes blocks with more remaining life for a storage system that has a failure distribution such as shown in FIG. 1
FIG. 2 illustrates a hypothetical example of how the block failure distribution such as in FIG. 1 may translate to an end of life scenario for a storage system that uses standard hot count wear leveling. A typical low-cost flash storage system may only have a few spare blocks available and, once the number of spares available has been consumed due to eventual failure of some of the blocks, the storage system itself can no longer accept data (i.e., fails) and becomes a read-only device. In FIG. 2, it is assumed that the number of spare blocks is 6 and the hypothetical failure distribution shows an example failure of the system at 3100 cycles because six blocks have failed by 3100 cycles. The remaining blocks in the storage system may have a considerable amount of erase cycle life left in them; in this example the average being perhaps 3500 cycles and some as good as 3900 cycles. However, this erase cycle life in the remaining good blocks is wasted because of the worst few blocks that failed first in this hot-count based wear leveling system.