When accessing storage devices, it is desirable to optimize utilization of the devices. For example, flash memory devices have memory cells with a finite number of program and erase cycles before the cells experience errors and become unusable. Frequently accessed host data should generally not be stored in such memory cells. Rather, frequently accessed host data should be stored in dynamic memory cells, i.e., cells with a comparatively larger number of program and erase cycles.
In addition, it is desirable to store data with the same anticipated host access pattern together in a memory storage device because of differences in access granularity between a host and the memory device. For example, in a flash memory device, it may not be desirable to store a temporary file that is frequently accessed in the same memory block as an image file that is infrequently accessed because accesses to the memory block that stores the temporary file would also access the memory cells that store the image file. Because the storage device only sees I/O operations that specify ranges of memory addresses, the characteristics of the data being stored or how the data will be accessed by the host in the future is not explicitly communicated to the storage device. As a result, data may be stored in suboptimal locations of the memory device.
The protocol stack through which a host system, such as a host computer, accesses a storage device is referred to as the host storage stack, commonly referred to as the file system driver. The host storage stack includes a number of layers abstracting application logic from the logical blocks that represent the storage device. These include caching layers, memory-mapped buffers, and file systems which allow an application developer to store data in files rather than managing the actual block device interface.
Over the last 30 years, the storage stack has evolved from linear-access technologies (such as tape) to random access devices that have a seek penalty (such as floppy disks and hard disk drives (HDDs)) to random-access flash devices, such as solid state drives (SSDs), which have no inherent seek penalty, but which access physically sequential data more efficiently than random-access data. As part of the abstraction, it is difficult for a storage device to define (or devise) the ideal access pattern desired by the host. In the case of flash storage devices, the device can adjust its storage strategy if the storage device has information in advance of what the host's read pattern would be for certain logical block address (LBA) sequences. As an example, if the device has information that a certain LBA range, for example, will be read sequentially at boot, it may make that range available for access before it completes internal initialization. As another example, if the device has information that that a certain LBA sequence will only hold temporary files with a lifetime of one host power cycle, it may choose particular flash regions which are tuned for lower retention or keep data destined for these LBAs in RAM. As alluded to above, most of the knowledge regarding LBA sequences is maintained in the upper layers of the host storage stack (such as the file system) and is not communicated down to the storage device.
Storage protocols such as hybrid serial advanced technology attachment (SATA) and non-volatile memory express (NVMe) include the ability for the host to create “hints”, which advise the device of characteristics of specific LBA ranges. These hints do not require the device to change its behavior, but optionally allow for optimization of specific ranges. Sending hints from the host to the storage device effectively requires the cooperation of multiple parts of the storage stack, since the metadata used to determine a hint is typically abstracted away within the file system.
Although current operating systems may send hints on an extremely limited basis, such hints are not effective for solid state drive optimization. This requires new solutions that bridge the host-device gap in hinting.
One particular type of non-volatile memory in which it is desirable to optimize utilization of the storage media is NAND flash memory. A NAND flash memory is organized in terms of blocks, and each block is further divided into a fixed number of pages. A block is the basic unit for erase operations, while reads and writes are processed in the unit of one page. A page cannot be overwritten unless it is erased.
Due to the special write constraints of multi-level cell (MLC) flash memory, pages of MLC flash memory can only be written sequentially in a block and partial programming to a page is not possible. The write constraints introduce extra overhead to writes over flash memory and make existing flash transition layer (FTL) designs (e.g., implementing different address translation tables) and other flash memory management schemes lack efficiency.
One existing problem associated with storage devices based on flash memory is that the flash memory management schemes do not have awareness of the file system of the host and thereby introduce overhead caused by write operations of different sizes, including overhead caused by live page copying of valid pages from victim blocks containing invalid pages to free pages in other blocks, such that system performance is significantly affected. Currently, host write operations are not managed in the storage device based on the memory locations of memory blocks that are yet to be written to the file system.
File system metadata in host write operations may provide an indication of where data will be written in the future to the non-volatile storage device. File system metadata signifying the beginning and end of data to be written to a storage device may allow for the storage device to perform memory management operations between write operations. However, as stated above, such file system metadata, while known to the host, is typically not known to the non-volatile storage device.
Accordingly, there exists a need for storage devices and methods for optimizing use of storage devices based on storage device parsing of file system metadata in host write operations.