The present invention relates to data storage, and more specifically, the present invention relates to a read buffer architecture capable of supporting integrated XOR-reconstructed and read-retry data reconstruction for non-volatile random access memory (NVRAM) systems.
NVRAM, such as flash memory (including “negated AND or NOT AND” (NAND) flash memory, NOR flash memory, multi-level cell (MLC) NAND flash memory), phase change memory (PCM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), etc., provides a non-volatile electrically reprogrammable storage medium at a lower cost and having higher performance in comparison to hard disk drives (HDDs) due to its higher data density. This higher density, although beneficial, is not without its own problems. One such problem is a higher error rate and shorter data retention time for data stored to MLC NAND flash memory. To enable MLC NAND flash memory to be a viable medium for enterprise-level storage, several techniques are conventionally used to improve its error performance and long-term reliability.
A first technique is a robust error correction code. A tradeoff is made between error correction strength (a number of bits that can be corrected per unit of data) and additional space required for redundant information that is used in the error correction. As NAND flash memories “age,” the number of errors per unit of data stored may exceed the error correction capability of even the most strenuous error correction schemes. When a sector of data is uncorrectable, other techniques must be used to recover the originally stored data.
Data retrieval from MLC NAND flash memories is highly sensitive to the voltage threshold used to distinguish between values of bits (e.g., 0's and 1's). This is especially true in MLC flash memory where a single memory cell encodes the values of multiple bits (thus requiring multiple voltages). Current storage devices provide the ability to adjust their read threshold voltages. Simply rereading the data from the flash memory with a different voltage threshold is often sufficient for retrieving data units that are otherwise uncorrectable. Provisions are made for: a) recognizing that an uncorrectable unit of data has been read; b) adjusting the threshold voltage(s); c) recreating the original read operation; d) storing read data; e) recognizing when a successful read has occurred, or recognizing when a predetermined number of retry attempts has failed.
Another method for recovering data in the presence of read errors is accomplished by using a variation on the concepts of redundant array of inexpensive disks (RAID) striping. In RAID, multiple identically sized units of independent data are grouped in a “stripe,” along with an additional “parity” bit or unit. As the units that make up the stripe are written to flash memory, XOR parity is accumulated across the stripe. When all the data units have been written, the accumulated XOR unit is written to complete the stripe. Should any data unit in the stripe exhibit uncorrectable errors, the original data can be recovered by XORing the data from all other data units in the stripe. To recover data from a RAID stripe, provisions are made for: a) recognizing that an uncorrectable unit of data has been read; b) initiating reads of the other data units in the stripe; c) accumulating parity as the stripe is read; d) monitoring error status as the data unit is read; e) recognizing when the complete stripe has been read and data has been successfully recovered. Note that reading the full RAID stripe for reconstruction imposes a significantly larger penalty on system performance than retrying reads with adjusted voltage threshold(s).
A typical read error recovery scenario, then, involves: a) some number of threshold-adjusted rereads of the failing data unit; and b) if the rereads fail to correct the error, reading all the other data units in the stripe to recover the original data (RAID reconstruct). An interesting situation occurs when errors are encountered during a RAID reconstruct operation, potentially invalidating the XOR accumulation. These types of errors are not easily recoverable and may lead to a RAID reconstruction failure.