Flash-based storage is currently the most common nonvolatile RAM technology used in solid-state drives (SSDs), while it is foreseen that in the near future other technologies, like the phase-change storage-class memory (PCM), will be used in solid-state storage systems. The usual approach of achieving high performance I/Os is to use multiple, independent and parallel accessed channels. The data rate achieved in each channel is limited mainly by the ‘Page Write’ and ‘Page Read’ time required by the flash device to complete the respective operation internally and by the clock rate at the device's interface.
One of the major issues of using flash chips or flash integrated circuit devices (ICs) in SSDs today, along with cost and I/O performance, is reliability and durability, due to the limited number of write/erase operations that can be performed in the flash cells. This phenomenon is known as the endurance problem. Typical maximum number of write/erase operations for flash cells is in the range of 10000 to 100000 whereas typical maximum number of write operations for PCM cells is in the range of 1 million to 100 million. Furthermore, measurements performed in deployed flash-based SSDs indicate that flash chips used in SSDs (especially in server applications, where a high I/O rate is required) present a higher than expected failure rate (one or more flash ICs fail).
Since each SSD uses a large number of flash ICs (usually a few tens) the probability of having at least a failed IC inside a SSD is significant. Depending on the SSD's architecture, when a flash device fails, part of the SSD (and in some cases the whole SSD) fails. In existing SSDs, a user sector or an encoded user sector or a codeword is stored in a single flash IC and the additional parity symbols, which are generated using a first error correction code, are used for correcting random/burst errors, i.e., for providing data reliability and increased endurance. In this case, when a Flash IC fails, the codewords stored in this specific IC cannot be recovered, which is undesirable. Existing SSDs use a second error detection/correction code to deal with the problem of device failure. However, this increases the complexity of the storage-system implementation.