In any die of a solid-state drive (SSD), there are likely to be a number of bad memory blocks, due to process technology and manufacturing variations, among other factors. Moreover, every block's endurance varies. In the early life of a die, most of the blocks are good. There are, however, some initial failures. During the bulk of the life of the die, random bit errors occur. Eventually, towards the end of life of the die, a wear effect manifests itself, in which the error rate increases. Every block goes through this lifecycle, albeit potentially at a different rate. Indeed, some blocks take a long time to go through this lifecycle, while others take a comparatively shorter period of time. To provide an adequate safety margin, however, conventional SSD systems are provisioned according to the worst-performing blocks.
Bits in a flash memory may be read incorrectly (i.e., develop bit errors) after being programmed. The charge level on a flash cell will change due to several conditions (e.g., time, temperature, accesses to other pages in the block, etc.). Eventually, when an affected cell is read, the wrong value is returned. Flash manufacturers specify a maximum number of bit errors for a flash page based on the process technology, cell design, lab testing, simulation, operating conditions, and the like. The bit error specification is usually specified as P errors per M bytes. In some cases, the controller manufacturer is responsible for implementing an Error Correcting Code (ECC), which satisfies or exceeds the specification. Types of ECC include Reed Solomon, BCH and Low-Density Parity-Check (LDPC) codes, which are methods of correcting bit errors in a block of data bits. The life (measured in Program/Erase (PE) cycles) of a flash device specified by a flash manufacturer is based on the implementation of the specified error correction requirements. Flash manufacturers provide extra bytes in a flash page to accommodate the number of expected ECC bits plus a small amount of space for other metadata such as, for example, Cyclic Redundancy Check (CRC) field, sector number, and the like.
The Open NAND flash Interface (ONFI) specification, version 2.3, defines a flash Page as containing a data area and a spare area. The spare area is intended for use in holding ECC checkbits and metadata, while the data area is assumed to contain sector (e.g. logical block) data. Errors can occur in data portions of specific pages and in entire pages. Different ECC codes and different error correction strategies are required for each type of error.