The present invention generally relates to memory devices used with computers and other processing apparatuses, and more particularly to mass storage devices that use non-volatile (permanent) based memory components for permanent storage of data.
All current operating systems for personal computers and servers were co-developed with hard disk drive (HDDs) as the prevalent storage medium. Hardly anything better reflects the fact that disk drives are at the core of the operating system than the acronym MS-DOS, short for Microsoft Disk Operating System. Even though MS-DOS is by today's standards only a niche product, its legacy has ramified into all existing Microsoft operating systems with the side effect that the file system at the core is centers on the special needs and strengths of rotatable storage media used in electromechanical disk drives.
A. Hard Disk Drives Vs. Solid State Drives
Hard disk drives are electro-mechanical devices with a single channel to access the internal media, whereas solid state drives (SSDs) have only electronics as functional units and access the media over multiple parallel channels. The key differences between HDDs and SSDs are probably best summarized by the following characteristics:
HDDNAND Flash-based SSDCost per ByteExtremely low cost per byteModerate cost per byteAccess LatencyHigh access latenciesLow access latenciesPower EnvelopeHigh power consumptionLow power consumptionOptimal DataOptimized for sequentialSequential or random IOsStructuretransfersData R/W accessSingle channel/singleMulti-channel,modethreadindependent parallelthreadsOverwriting ofDirect overwriting ofNo direct overwriting ofexisting dataexisting dataexisting data possibleData RetentionUnlimited data retentionLimited data retentionMechanicalSensitive to mechanicalInsensitive to mechanicalStabilityshock, vibration, humidity,shock, vibration andtemperaturehumidity, minorsensitivity totemperature
Cost per byte, data retention, power considerations and mechanical stability are parameters that primarily play into the market acceptance and purchasing decision for new and additional storage media. At the same time, access latency can be directly tied to overall storage subsystem performance. For the following considerations, the just mentioned characteristics are of only ancillary importance, therefore, this following will specifically focus on how data are stored in traditional HDDs as opposed to SSDs and elaborate on the shortcomings of existing file or disk operating systems.
B. Concurrent Evolution of Hard Disk Drives and File Systems
Hard disk drives store data on rotatable platters divided into sectors that are moving under a read/write head. The read/write head is mounted on an actuator which in turn moves the head across the different tracks from outer to inner diameter of the platters. Track density is constantly increasing, requiring higher precision positioning of the head over the track with every generation of drive. This is achieved through embedded positional signals in the form of servo tracks interspersed with the data sectors. However, because of the required positional precision, it is not practical to read simultaneously from two heads, for the simple reason that even small temperature gradients in the actuator assembly could cause skewing of one head relative to the other, causing one of the heads to miss its target track.
Consequently, at any given time, transfer to or from the media is limited to a single bit-stream transferred via a single head. During a read access, the bit-stream is converted by the internal logic into an 8 bit/10 bit encrypted host data stream that is then decoded on the system level into the actual bytes requested by the host. Writing data to the storage device reverses the process, that is, a command—address—data package is sent from the host to the drive via the writeFirstPartyDMA command, after which the data are 8 bit/10 bit encoded and written to physical sectors on the rotating platters. File updates are preferentially done by overwriting the entire file to the same sectors used by the original file. The entire file system is sector based wherein each sector corresponds to a logical block address (LBA) hard encoded on the platters. In the case of File Allocation Table (FAT)-based file systems, the minimum data structure is established by equally dividing the entire LBA space by the number of available address bits (for example FAT-32), whereas in Windows NT File System (NTFS), a fixed cluster size is used by the file system (typically 4 kBytes corresponding to 8 sectors of 512 Bytes).
C. Physical Contiguity of Files as Prerequisite for Performance of Rotatable Media
Hard disk drive performance strongly depends on the physical location of data or sectors with respect to contiguousness and eccentricity. Any interruption in the sequence of LBAs will cause significant seek and rotational latencies, moreover, transfer rates depend on the linear speed of the media under the read/write head, therefore tracks at the outer diameter will have much higher sequential transfer rates than tracks at the inner diameter. Any HDD will show the highest performance if contiguous data structures are established either during the initial write process or through defragmentation and if that contiguity is maintained even if the files are modified. A simplified way of describing this mode of operation of a hard disk drive is the term read-modify-write, that is, data are read from the media, modified by the host and then written back-preferentially to the same LBA and using additional LBAs as overflow only if the file size increases over the previous version. In case that the newer version of the file is smaller than the original, a gap consisting of invalid data is created at the end of the file and will typically persist until the drive is defragmented.
Hard disk drives and their specific capabilities and limitations have been crucial elements in the evolution of operating systems. It is not surprising, therefore, that it has been mandatory to optimize file systems to preserve any physical coherency of data as the heart of maintaining disk performance. Moreover, since only a single bit stream can be committed to the media at any time, the host system—at least in the case of ATA, will refrain from sending parallel requests or mix and match batches of different files simultaneously.
Regardless of the shortcomings of HDD technology, one of the strong points is their essentially unlimited data retention. Barring any mechanical or logic failure of the drive, data committed to the media are permanent at least within the time scale used for digital storage. Data retention is independent of the drive being powered up or offline since no refresh cycles are necessary to counteract leakage resulting in bit rot.
D. Solid State Media
Unlike conventional HDDs, solid state media are not relying on a single head to commit data to storage, rather, solid state storage media typically write data in a highly parallel fashion to the memory devices. The broadest interpretation of the term solid state memory circumscribes any type of IC-based memory technology but, based on the cost per bit and overall distribution, only NAND flash memory is relevant for the current storage landscape.
E. Strength and Limitations of NAND-Flash Media
NAND flash memory is a compromise between cost per bit, speed, data retention and write endurance, with some of the parameters being on opposite ends of the spectrum. In the context of re-writable mass storage, the biggest functional differences between NAND flash memory and HDD platters are the greatly reduced access latencies, the parallel data paths and, last but not least, the fact that NAND flash memory cannot be simply overwritten with new data. NAND flash memory cells are made up from floating gate transistors that can be programmed only in a unidirectional manner. Moreover, in order to simplify the design of NAND flash memory and also to avoid artifacts stemming from the electrical fields associated with Fowler-Nordheim quantum mechanical tunneling, erase processes have to be carried out on a per-block basis. As a consequence, every page of NAND flash memory needs to be pre-erased before the individual cells can be programmed.
F. Append Vs. Overwrite
The requirement of NAND flash to pre-erase blocks before they can be re-programmed precludes the use of the simple “read-modify-write” scheme discussed above in the context of HDDs. Instead, NAND flash based solid state drives have to operate strictly in “append” mode, meaning that data are written to virgin pages as long as those are available. Because no mechanical parts are involved, the actual locality of the data is largely inconsequential, even though it is advantageous to distribute logically coherent data such that all available channels can be used to access them in a parallel fashion. In analogy to the read-modify-write scheme used by HDDs, an adequate term would be read-modify-append, followed by an invalidate and finally an erase operation for the original location.
Once the drive runs out of virgin pages, data have to be moved around, consolidated in order to free up entire blocks which are then erased, before the blocks can be cycled into the next write access. One fact to be taken into account in this regard is that consolidation of data does not mean physical coalescing of logically coherent data as in the case of defragmentation of HDDs. On the contrary, similar as in the case of a write access, in order to optimize recurrent read transfers, it is advantageous to maintain distribution of logically coherent data over as many independent channels as possible. The file translation layer provides the logical to physical block mapping.
G. Data Retention, Refresh and Remap
A second, yet important difference between HDDs and flash memory in any of its iterations is the limited data retention of flash because of either leakage current or from read-disturb effects as described in more detail below. Flash memory stores data in the form of charges in the floating gate of the floating gate transistor but eventually those charges will dissipate through the oxide layer into the substrate, resulting in bit rot. This process typically takes anywhere from 1 to several years but, especially in the case of archived data, leakage currents leading to bit rot still become an important factor.
Read-disturb refers to a different phenomenon; that is, because of the specific architecture of NAND flash, a read access of a single page requires the biasing of all pages in the same block via the word lines to typically 5 V. Over time, the cumulative electrical fields applied through the word lines will have a similar effect as programming charges that are applied through the very same word lines to the control gates. The result can be creeping charge of the floating gate which alters the bit values of the NAND flash cells and which is also known as read-disturb. Arguably, read-disturb takes thousands of read accesses to the same block before any noticeable effect occurs. However, at least in the case of MLC NAND flash with 256 pages per block it only requires a limited number of sequential scrolls through all pages to hit the threshold at which programming charge shifts result in increased number of bit errors.
Any modern SSD will take countermeasures against both leakage current and read-disturb through measuring the bit error rate on read accesses, which can be determined by comparing the actual data against the corrected version based on the implemented ECC algorithms. If the bit error rate increases, the data are refreshed or rather re-written, yet this rewriting entails moving the data to a different block on the same NAND device or moving the data to a different channel altogether. The result is a dynamic, constant change of logical to physical address mapping. Even though this type of mapping is done on the level of the flash translation layer and is transparent to the host, it requires more sophisticated metadata that need to be able to track the different mapping units as well as their status with respect to being valid data or having been invalidated for the file system in order to schedule the pages or block for garbage collection and TRIM-based erasing.
H. File System Challenges
As discussed above, currently prevailing operating systems have evolved on the legacy of disk operating systems, regardless of whether it is any Microsoft, Unix or Linux-based OS. Historically, the single bit-stream of HDDs matched the requirements of a single core processor since at any given time only a single thread was being processed. This situation has dramatically changed with the introduction of multi-core processors and also thread level parallelism using Intel's HyperThreading. Multiple data streams are processed in parallel and eventually, all data need to be written to the drive. On the system level, this can be accomplished by queuing up requests to serially transfer data from the host to the drive using the same strategies as what has been used for decades in HDD technology. On the level of the drive, native command queuing streamlines the workload through intelligent re-ordering and scheduling of the different workloads to minimize mechanical movement and wear of the drive. Data are updated using an “in place” strategy, that is, by using the above discussed read-modify-write method to preserve as much as possible contiguous physical data structures.
In combination with the transition from a parallel ATA to a serial ATA host interface, these measures have greatly improved the way data are moved between the host and the storage devices, resulting in an adequate match between the OS handling data through the file system and existing HDD technology. However, current file systems hardly take advantage of the capabilities of solid state drives.
The currently used method of interfacing NAND flash with the operating system simply employs a standard SATA interface and uses NAND flash ICs at the back-end. The standard SATA protocol interfaces the drive with the system and the drive then uses the flash translation layer to map logical to physical block addresses. Native command queuing is adapted to fill the parallel NAND channels. However, NAND flash does not allow “in place” updates, rather, all updates need to be written to a new location, whereupon the metadata are changed to reflect the new physical data structure. For maximum efficiency, physical contiguity of the data structures needs to be disrupted and the individual file fragments need to be distributed over as many channels as possible in order to allow the fastest possible parallel load/store accesses.
While the currently used adaptation of the disk-based file systems for NAND flash is a reasonable compromise, it is clear that better file systems are needed to take advantage of the special features of NAND flash or other solid state memory media.