When accessing storage devices, it is desirable to optimize utilization of the devices. For example, flash memory devices have memory cells with a finite number of program and erase cycles before the cells experience errors and become unusable. Frequently accessed host data should generally not be stored in such memory cells. Rather, frequently accessed host data should be stored in dynamic memory cells, i.e., cells with a comparatively larger number of program and erase cycles.
In addition, it is desirable to store data with the same anticipated host access pattern together in a memory storage device because of differences in access granularity between a host and the memory device. For example, in a flash memory device, it may not be desirable to store a temporary file that is frequently accessed in the same memory block as an image file that is infrequently accessed because accesses to the memory block that stores the temporary file would also access the memory cells that store the image file. Because the storage device only sees I/O operations that specify ranges of memory addresses, the characteristics of the data being stored or how the data will be accessed by the host in the future is not explicitly communicated to the storage device. As a result, data may be stored in suboptimal locations of the memory device.
The protocol stack through which a host system, such as a host computer, accesses a storage device is referred to as the host storage stack, commonly referred to as the file system driver. The host storage stack includes a number of layers abstracting application logic from the logical blocks that represent the storage device. These include caching layers, memory-mapped buffers, and file systems which allow an application developer to store data in files rather than managing the actual block device interface.
Over the last 30 years, the storage stack has evolved from linear-access technologies (such as tape) to random access devices that have a seek penalty (such as floppy disks and hard disk drives (HDDs)) to random-access flash devices such as solid state drives (SSDs), which have no inherent seek penalty but which access physically sequential data more efficiently than random-access data. As part of the abstraction, it is difficult for a storage device to define (or devise) the ideal access pattern desired by the host. In the case of flash storage devices, the device can adjust its storage strategy if the storage device has information in advance of what the host's read pattern would be for certain logical block address (LBA) sequences. As an example, if the device has information that a certain LBA range, for example, will be read sequentially at boot, it may make that range available for access before it completes internal initialization. As another example, if the device has information that that a certain LBA sequence will only hold temporary files with a lifetime of one host power cycle, it may choose particular flash regions which are tuned for lower retention or keep data destined for these LBAs in RAM. As alluded to above, most of the knowledge regarding LBA sequences is maintained in the upper layers of the storage stack (such as the file system) and is not communicated down to the storage device.
Storage protocols such as Hybrid serial advanced technology attachment (SATA) and non-volatile memory express (NVMe) include the ability for the host to create “hints”, which advise the device of characteristics of specific LBA ranges. These hints do not require the device to change its behavior, but optionally allow for optimization of specific ranges. Sending hints effectively requires the cooperation of multiple parts of the storage stack, since the metadata used to determine a hint is typically abstracted away within the file system.
Although current operating systems may send hints on an extremely limited basis, such hints are not effective for solid state drive optimization. This requires new solutions that bridge the host-device gap in hinting.
Accordingly, there exists a need for methods, systems, and computer readable media for automatically deriving hints from accesses to a storage device and from file system metadata and for optimizing utilization of the storage device based on the hints.