Computer systems often employ solid-state storage devices for storage and retrieval of large amounts of data.
Solid state memory devices use electrical charges to change the physical properties of a medium to store information in a non-volatile manner. One of the inherent advantages of the solid-state memory is fast data access time and low energy consumption.
Among the solid-state memory available, a flash memory is a memory which uses floating-gate transistors arranged in NOR or NAND gates to store data for electronic devices. Flash memory is one of the most widely used non-volatile memories. However, flash memory is prone to degradation which often shortens its lifespan considerably. Another type of solid-state storage is a phase-change memory which uses the difference in electrical resistivity between two states of a material to store data for electronic devices whereby the material changes states when heated and cooled.
Solid-state memory also includes memristor memory and carbon nanotubes. Memristor memory uses the change in resistivity of the media in relation to the direction of the current applied to it to store data. Electrical resistance increases if current is applied in one direction, and decreases as current is applied in a second direction. Carbon nanotube memory uses the position of carbon nanotubes between two substrates to store data. A properly applied charge changes the position of the carbon nanotubes which changes their electrical resistance.
In a flash memory, the data may quickly be electrically erased and reprogrammed. Unlike traditional memories, such as DRAM or SRAM, the flash memory does not require power to retain the data. Unlike rotating media, there are no moving parts in flash memory. This allows flash memory to have higher bandwidth and lower latencies than traditional rotating media. Flash memory also consumes less power since no motors are required to spin or move any part of device. Flash memory is more expensive than rotating media but the advantages in speed, latency and power are making it increasingly attractive in mass storage systems.
There are two types of flash memory, i.e., NAND flash and NOR flash. Their names refer to the logic gates that are the fundamental building blocks of the memory. NOR flash is faster and more reliable than NAND flash but is more expensive. NAND flash has higher densities than NOR flash and is significantly cheaper but does not have an operational life as long as the NOR flash.
NAND flash may be implemented either as single-level cell (SLC) devices which store only one bit of information per cell, or as multi-level cell (MLC) devices which are able to store more than one bit per cell by choosing between multiple levels of electrical charge applied to the floating gates of the cells. This gives MLC devices a higher storage density than SLC devices but also may cause them to be more prone to errors and reduce the number of program/erase cycles before the cells wear out.
In memory systems, flash cells are grouped into pages. A page is the smallest “chunk” of data that may be read or programmed in one operation. Pages are typically formed with 2112 bytes of data for SLC devices, and 4314 bytes of data for MLC devices. The extra data bytes are used to store ECC (Error Correcting Code) bits and metadata information. The flash pages are then arranged into groups of 64 or 128 pages called blocks. The pages of a block may be read individually but the pages in the block must be programmed sequentially. All of the pages of a block are erased together. If a page can no longer be programmed correctly, or the block cannot be erased, then the entire block is retired.
Thousands of blocks are arranged together into a single die. A single die may only have one operation outstanding at a time, i.e., a “read”, “program”, or “erase”. A flash part usually has 2 die per part. Flash storage devices typically have a number of parts to allow for multiple concurrent access.
Flash devices are not 100% reliable and often ship with 1-2% of the cells being defective. Data in flash cells also degrades over time and is subject to disturbances from external influences such as, for example, temperature. Extra bits for ECC are required to detect and correct the errors in the data. To prevent loss of the data, the flash storage device must monitor 10 access to the device and the ECC bits, and additionally use wear leveling to correct problems.
Wear leveling is a method of moving data between flash cells to prolong the life of the flash cells. Typically SLC devices will wear out after 100K program/erase cycles and MLC device wear out after 5K to 7K program/erase cycles.
The most prevalent source of corruption of cell data is “Program disturb”, which may occur when cells not being programmed are disturbed by nearby program operations. This does not damage the cells but may corrupt the data contained therein.
“Read disturb” is also an issue with flash devices. Although being minor per each operation, this problem may accumulate over time. Traditionally the cell data will need to be refreshed after approximately one million “read” cycles for SLC and 100K “read” cycles for MLC.
Flash cells also may wear out over time since the gate levels trends towards a quiescent level. The cells also wear out with each “program”/“erase” cycle as excess charge accumulates in the dielectric. This puts a limit on the lifespan of the flash parts since the cells can no longer be used when either the “program” or “erase” operation fails. Wear leveling is used to limit number of “program”/“erase” cycles to individual cells by spreading the “program”/“erase” cycles across all of the flash cells.
The current state of flash technology may suffer from the following drawbacks:
(a) latency increases as cells wear out since more time is needed to erase cells, program the cells, and accurately read the data in the cells;
(b) “writes” typically take an order of magnitude longer than “reads”;
(c) “erase” operations may take significantly longer than “writes” but are performed on an entire block;
(d) wear leveling may introduce large delays due to reading the entire block, programming to new block, and erasing the old block;
(e) wear out of cells in pages may cause delays when the pages need to be re-mapped. Pages are arranged in groups called blocks. If a single page fails then the data in the entire block of pages must be moved so the block can be retired. Controllers must check the status of “program” and “erase” operations. Blocks are marked as “bad” and can no longer be used when these operations fail;
(f) flash parts rely on ECC to protect data. Flash controllers detect and correct many errors but error correction may not always be possible. “Read” operations only return “good”, “marginal” and “bad” status, where “good” indicates the data is healthy or minimal EEC correction is needed to recover the data, “marginal” indicates that the data was read but many errors were recovered or little ECC protection remains, and “failed” indicates that the data had too many errors and could not be read correctly.
Examples of Typical MLSL and SLC Parts:
Typical part size(MLC)(SLC)bytes per page43142112pages per block12864ECC (per 512 bytes)4+1Endurance~5K~100KRead (max)50 us25 usProgram600-900 us200-300 usErase3000 us2000 usLatency Examples of MLC Parts:
Total time to read entire block=50 us/page*128 pages/blocks=6400 us (6.4 msec).
Total Min time to program entire block=600 us/page*128 pages/blocks=76800 us (76.8 msec).
Total Max time to program entire block=900 us/page*128 pages/blocks=115200 us (115.2 msec).
Total time to erase entire block−3000 us (3.0 msec).
Total time to wear level one block=(50 us/page+600 us/page)*128 pages/blocks+3000 us/block=79850 us (79.85 msec).
It is clear that the recovery operation due to a single failed page may cause significant JO delays.
Phase-change memory (also known as PCME, PRAM, PCRAM or C-RAM) uses the unique properties of amorphous and crystalline chalcogenide glass to store data. By carefully heating chalcogenide glass, it may be switched between the amorphous state which has high electrical resistance and the crystalline state which has a lower electrical resistance.
Phase-change memory has the potential to scale beyond the capacity of flash devices with faster access times. It also may endure millions of “program”/“erase” cycles before the part degrades. Unlike flash, the data stored in PRAM devices are not subject to disturbance from “reads” and nearby “writes”.
However, as a disadvantage, PRAM is more susceptible to temperature changes. A device may be erased simply by heating it to a sufficient temperature.
PRAM devices are also susceptible to data deterioration due to mechanical stress on the cells since the data is in a PRAM device is stored due to a physical change in the properties of the media. The chalcogenide glass expands and contracts slightly with every “program” operation and this may cause an undesirable contact between the glass and the adjacent dielectric.
In view of the drawbacks of the current state of solid-state storage systems, a reliable prevention of the data corruption and errors detection/correction constitutes an important issue.
In order to improve the reliability of data storage systems, redundant array of disk drives have been utilized. Redundant Arrays of Independent Disks (RAID) have grown in usage. In the originally proposed five levels of RAID systems, RAID-5 systems have gained great popularity for use in local area networks and independent personal computer systems, such as for media database systems. In RAID-5, data is interleaved by stripe units across the various disk drives of the array along with error correcting parity information. However, unlike RAID-3, wherein data and parity information are stored in dedicated physical disk drives, RAID-5 distributes the data and parity information across all of the disk drives in an interleaved fashion. The data and parity information is stored in logical disk drives. The parity data in a RAID-5 system provides the ability to correct only for a failure of valid data from a single disk drive of the array.
RAID-6 systems have since been developed for data storage systems requiring a greater fault tolerance. In RAID-6, data is interleaved in striped units distributed with parity information across all of the disk drives as in the RAID-5 system. However, to overcome the disadvantage of RAID-5's inability to correct for faulty data being retrieved for more than one disk drive, the RAID-6 system utilizes a redundancy scheme that can recover from the receipt of invalid data from any two of the disk drives. Although this scheme also uses logical disk drives, an additional disk drive device is added to the array to account for the additional storage required for the second level of parity data required. The RAID-6 parity scheme typically utilizes either a two-dimensional XOR algorithm or a Reed-Solomon code in a P+Q redundancy scheme.
Even utilizing the RAID-6 architecture, such systems while having the ability to detect failures in up to two disk drives, cannot correct the data unless each disk drive in error is identified. Such is the case in the storage system architecture disclosed in U.S. Pat. No. 7,127,668, but modified with an additional parity drive for use with a dual parity engine. Without the ability to identify the disk storage channel in error, the more fault tolerant parity algorithm of the RAID-6 system is unable to provide corrected data to the requesting processor, and must therefore report a “read error” to the processor requesting the data. Thus, there is a need to provide a means for identifying the disk drive in error in such instances.
In order to provide large data capacity, a large number of disk drives are often arrayed and the additional disk drives required for two or more levels of parity data further increases the total number of disk drives in the array. As these systems send the same command to all of the disk drives, and then wait for all of the disks to finish a command before a new command is sent thereto, the data transfer rate of the memory array is limited by the “slowest” disk drive of the array. That characteristic can be particularly limiting since disk drives often exhibit unduly long access times as they begin a failure process were their performance degrades. Thus, it may be an extended period of time before they are identified as having failed by the memory system or the drive itself.
Current RAID-3 systems tried to overcome this latency problem by starting data transfers early, prior to all of the disk drives having completed a read command. This process is started so long as the data needed is already in the cache memory or can be reconstructed utilizing parity data. However, RAID-3 systems employing such techniques are unable to verify the integrity of the data being transferred to the initiator when that latency reduction technique is utilized. This method of improving latency is at a cost of data integrity, which is not an acceptable trade-off. Thus, there is a need to provide a method for reducing latency while still preserving the data integrity of the data provided by the memory system.
In comparison with disk drives, solid-state memory storage devices provide a faster access time than rotating media due to the absence of moving parts, and require less energy for operation due to the absence of motors to move the media. It therefore would be advantageous to apply the principals of the reliability used in RAID systems to the inherent advantages provided by the solid-state storage devices, and to further advance such a “hybrid” memory storage system towards substantially error-free data “read” operation through identification and auto-correction of detected errors.