The continuous expansion of the Internet, the expansion and sophistication of enterprise computing networks and systems, the proliferation of content stored and accessible over the Internet, and numerous other factors continue to drive the need for large sophisticated data storage systems. Consequently, as the demand for data storage continues to increase, larger and more sophisticated storage systems are being designed and deployed. Many large scale data storage systems utilize storage appliances that include arrays of storage media. Typically, these storage systems include a file system for storing and accessing files. In addition to storing system files (e.g., operating system files, device driver files, etc.), the file system provides storage and access of user data files. For a user to access a file, one or more input/output (I/O) requests are generated to retrieve data blocks associated with the file. Any time an I/O operation is performed, a processing speed of the storage system is impacted as the requested data is retrieved from the storage media. Depending on a type of storage media storing the requested data, there is an increased latency in fulfilling an I/O request. For example, retrieving data from cache memory is faster than retrieving data from random access memory (RAM), which is faster than retrieving data from persistent storage media, such as spinning disks.
To reduce latency in fulfilling I/O requests, data blocks may be prefetched from slower storage media into faster storage media in anticipation of the data blocks being requested later as part of an access pattern. Stated differently, if a file system can predict which data blocks will be requested by identifying an access pattern, the data blocks may be retrieved from slower storage media into faster storage media, so that they are available in the faster storage media when requested. Identifying access patterns, however, is complicated and generally involves considerable overhead. For example, conventional prefetching mechanisms often involve the tracking of significant state about each active access stream, increasing complexity and the incidence of mistakes. Additionally, when a user is accessing a file sequentially, the I/O requests may not be received in a sequential order. In this case, many conventional prefetching mechanisms fail to recognize the sequential access stream. These challenges are further exacerbated with the existence of concurrent sequential access streams. Many conventional prefetching mechanisms limit a number of concurrent sequential streams due to the high overhead in maintaining stream state.
In addition to the challenges associated with identifying access streams and prefetching data accordingly, availability of data blocks in faster storage media, such as cache memory, must compete with memory pressures. Stated differently, if too much data is prefetched, the cache becomes polluted with data that may never be accessed, and prefetch data competes with other data being accessed in the cache. On the other hand, if data is being accessed faster than it is prefetched, the user may experience an increased latency.
It is with these observations in mind, among others, that various aspects of the present disclosure were conceived and developed.