1. Field of the Invention
The present invention is directed to a method for correcting errors in data read from an array of solid-state storage devices.
More particularly, the present invention is directed to an auto-correction method for detecting and repairing errors in stripes of data across the array of the sold-state storage devices.
Additionally, the present invention is directed to an auto-correction method utilizing a dual parity generation engine which may detect and correct errors in individual data words arranged in a block to prevent retirement of the entire block.
Still further, the present invention is directed to an error correcting technique for a memory system which takes advantage of the dual parity generation engine to transfer data from a cache memory to a stage buffer memory and to deliberately map out the data from each of the solid-state storage devices of an array thereof in a sequential manner as data is repetitively transferred between the cache memory and the stage buffer memory. Responsive to the Dual parity generation engine identifying valid data being obtained with the mapped-out solid-state storage device being a known single device fault, the mapped-out solid-state storage device is identified as the solid-state storage device in error. The valid data reconstructed by the dual parity generation engine and transferred to the stage buffer memory is subsequently transferred to a processor requesting the data to complete the read operation.
2. Background of the Invention
Computer systems often employ arrays of solid-state storage devices for storage and retrieval of large amounts of data.
Solid state memory devices use electrical charges to change the physical properties of a medium to store information in a non-volatile manner. One of the inherent advantages of the solid-state memory is fast data access time and low energy consumption.
Among the solid-state memory available, a flash memory is a memory which uses floating-gate transistors arranged in NOR or NAND gates to store data for electronic devices. Flash memory is one of the most widely used non-volatile memories. However it is prone to degradation which often shortens its lifespan considerably. Another type of solid-state storage is a phase-change memory which uses the difference in electrical resistivity between two states of a material to store data for electronic devices whereby the material changes states when heated and cooled.
Solid-state memory also includes memristor memory and carbon nanotubes. Memristor memory uses the change in resistivity of the media in relation to the direction of the current applied to it to store data. Electrical resistance increases if current is applied in one direction, and decreases as current is applied in a second direction. Carbon nanotube memory uses the position of carbon nanotubes between two substrates to store data. A properly applied charge changes the position of the carbon nanotubes which changes their electrical resistance.
In a flash memory, the data may quickly electrically erased and reprogrammed. Unlike traditional memories, such as DRAM or SRAM, the flash memory does not require power to retain the data. Unlike rotating media, there are no moving parts in flash memory. This allows flash memory to have higher bandwidth and lower latencies than traditional rotating media. Flash memory also consumes less power since no motors are required to spin or move any part of device. Flash memory is more expensive than rotating media but the advantages in speed, latency and power are making it increasingly attractive in mass storage systems.
There are two types of flash memory, i.e., NAND flash and NOR flash. Their names refer to the logic gates that are the fundamental building blocks of the memory. NOR flash is faster and more reliable than NAND flash but is more expensive. NAND flash has higher densities than NOR flash and is significantly cheaper but does not have an operational life as long as the NOR flash.
NAND flash may be implemented either as single-level cell (SLD) devices which store only one bit of information per cell, or as multi-level cell (MLC) devices which are able to store more than one bit per cell by choosing between multiple levels of electrical charge applied to the floating gates of the cells. This gives MLC devices a higher storage density than SLC devices but also may cause them to be more prone to errors and reduce the number of program/erase cycles before the cells wear out.
In memory systems, flash cells are grouped into pages. A page is the smallest “chunk” of data that may be read or programmed in one operation. Pages are typically formed with 2112 bytes of data for SLC devices, and 4314 bytes of data for MLC devices. The extra data bytes are used to store ECC (Error Correcting Code) bits and metadata information. The flash pages are then arranged into groups of 64 or 128 pages called blocks. The pages of a block may be read individually but the pages in the block must be programmed sequentially. All of the pages of a block are erased together. If a page can no longer be programmed correctly, or the block cannot be erased, then the entire block is retired.
Thousands of blocks are arranged together into a single die. A single die may only have one operation outstanding at a time, i.e., a “read”, “program”, or “erase”. A flash part usually has 2 die per part. Flash storage devices typically have a number of parts to allow for multiple concurrent access.
Flash devices are not 100% reliable and often ship with 1-2% of the cells being defective. Data in flash cells also degrades over time and is subject to disturbances from external influences such as, for example, temperature. Extra bits for ECC are required to detect and correct the errors in the data. To prevent loss of the data, the flash storage device must monitor IO access to the device and the ECC bits, and additionally use wear leveling to correct problems.
Wear leveling is a method of moving data between flash cells to prolong the life of the flash cells. Typically SLC devices will wear out after 100K program/erase cycles and MLC device wear out after 5K to 7K program/erase cycles.
The most prevalent source of corruption of cell data is “Program disturb”, which may occur when cells not being programmed are disturbed by nearby program operations. This does not damage the cells but may corrupt the data contained therein.
“Read disturb” is also an issue with flash devices. Although being minor per each operation, this problem may accumulate over time. Traditionally the cell data will need to be refreshed after approximately one million “read” cycles for SLC and 100K “read” cycles for MLC.
Flash cells also may wear out over time since the gate levels trends towards a quiescent level. The cells also wear out with each “program”/“erase” cycle as excess charge accumulates in the dielectric. This puts a limit on the lifespan of the flash parts since the cells can no longer be used when either the “program” or “erase” operation fails. Wear leveling is used to limit number of “program”/“erase” cycles to individual cells by spreading the “program”/“erase” cycles across all of the flash cells.
The current state of flash technology may suffer from the following drawbacks:
(a) latency increases as cells wear out since more time is needed to erase cells, program the cells, and accurately read the data in the cells;
(b) “writes” typically take an order of magnitude longer than “reads”;
(c) “erase” operations may take significantly longer than “writes” but are performed on an entire block;
(d) wear leveling may introduce large delays due to reading the entire block, programming to new block, and erasing the old block;
(e) wear out of cells in pages may cause delays when the pages need to be re-mapped. Pages are arranged in groups called blocks. If a single page fails then the data in the entire block of pages must be moved so the block can be retired. Controllers must check the status of “program” and “erase” operations. Blocks are marked as “bad” and can no longer be used when these operations fail;
(f) flash parts rely on ECC to protect data. Flash controllers detect and correct many errors but error correction may not always be possible. “Read” operations only return “good”, “marginal” and “bad” status, where “good” indicates the data is healthy or minimal EEC correction is needed to recover the data, “marginal” indicates that the data was read but many errors were recovered or little ECC protection remains, and “failed” indicates that the data had too many errors and could not be read correctly.
Examples of Typical MLSL and SLC Parts:
Typical part size(MLC)(SLC)bytes per page43142112pages per block 128 64ECC (per 512 bytes)4+1Endurance~5K~100KRead (max)  50  us    25 us Program600-900 us 200-300 us Erase  3000 us   2000 usLatency Examples of MLC Parts:
Total time to read entire block=50 us/page*128 pages/blocks=6400 us (6.4 msec).
Total Min time to program entire block=600 us/page*128 pages/blocks=76800 us (76.8 msec).
Total Max time to program entire block=900 us/page*128 pages/blocks=115200 us (115.2 msec).
Total time to erase entire block—3000 us (3.0 msec).
Total time to wear level one block=(50 us/page+600 us/page)*128 pages/blocks+3000 us/block=79850 us (79.85 msec).
It is clear that the recovery operation due to a single failed page may cause significant IO delays.
Phase-change memory (also known as PCME, PRAM, PCRAM or C-RAM) uses the unique properties of amorphous and crystalline chalcogenide glass to store data. By carefully heating chalcogenide glass, it may be switched between the amorphous state which has high electrical resistance and the crystalline state which has a lower electrical resistance.
Phase-change memory has the potential to scale beyond the capacity of flash devices with faster access times. It also may endure millions of “program”/“erase” cycles before the part degrades. Unlike flash, the data stored in PRAM devices are not subject to disturbance from “reads” and nearby “writes”.
However, as a disadvantage, PRAM is more susceptible to temperature changes. A device may be erased simply by heating it to a sufficient temperature.
PRAM devices are also susceptible to data deterioration due to mechanical stress on the cells since the data is in a PRAM device is stored due to a physical change in the properties of the media. The chalcogenide glass expands and contracts slightly with every “program” operation and this may cause an undesirable contact between the glass and the adjacent dielectric.
In view of the drawbacks of the current state of solid-state storage systems, a reliable prevention of the data corruption and errors detection/correction constitutes an important issue.
To improve the reliability of data storage systems, redundant arrays of disk drives have been utilized. Redundant Arrays of Independent Disks (RAID) have recently grown in usage. In the originally proposed five levels of RAID systems, RAID-5 systems has gained great popularity for use in local area networks and independent personal computer systems, such as media database systems. In RAID-5 systems, data is interleaved by stripe units across the various disk drives of the array along with error correcting parity information. However, unlike RAID-3 systems wherein there is a dedicated parity disk, RAID-5 systems distribute parity across all of the disk drives in an interleaved fashion.
The parity data in a RAID-5 system provides the ability to correct data only for a failure of a single disk drive of the array. Data storage systems requiring a greater fault tolerance, utilize a later proposed RAID-6 system. In RAID-6 systems, data is interleaved in stripe units distributed with parity information across all of the disk drives. To overcome the disadvantage of the RAID-5 system inability to correct for a failure of more than one disk drive, the RAID-6 system utilizes a redundancy scheme that can recover from a failure of any two disk drives. The RAID-6 parity scheme typically utilize either a two-dimensional XOR algorithm or a Reed-Solomon Code in a P+Q redundancy scheme.
Even utilizing the RAID-6 architecture, such systems while having the ability to detect failures in up to two disk drives, cannot correct the data unless each disk drive in error is identified. Such is the case in the storage system architecture disclosed in U.S. Pat. No. 7,127,668, but modified with an additional parity drive for use with a dual parity engine. Without the ability to identify the disk storage channel in error, the more fault tolerant parity algorithm of the RAID-6 system is unable to provide corrected data to the requesting processor, and must therefore report a “read error” to the processor requesting the data. Thus, there is a need to provide a means for identifying the disk drive in error in such instances.
In comparison with disk drives, solid-state memory storage devices provide a faster access time than rotating media due to the absence of moving parts, and require less energy for operation due to the absence of motors to move the media. It therefore would be advantageous to apply the principals of the reliability used in RAID systems to the inherent advantages provided by the solid-state storage devices, and to further advance such a “hybrid” memory storage system towards substantially error-free data “read” operation through identification and auto-correction of detected errors.