Computer data storage devices, such as disk drives and Redundant Array of Independent Disks (RAID), typically use a cache memory in combination with mass storage media (e.g., magnetic tape or disk) to save and retrieve data in response to requests from a host device. Cache memory, often referred to simply as “cache”, offers improved performance over implementations without cache. Cache typically includes one or more integrated circuit memory device(s), which provide a very high data rate in comparison to the data rate of non-cache mass storage medium. Due to unit cost and space considerations, cache memory is usually limited to a relatively small fraction of (e.g., 256 kilobytes in a single disk drive) mass storage medium capacity (e.g., 256 Gigabytes). As a result, the limited cache memory should be used as efficiently and effectively as possible.
Cache is typically used to temporarily store data that is the most likely to be requested by a host computer. By read pre-fetching (i.e., retrieving data from the host computer's mass storage media ahead of time) data before the data is requested, data rate may be improved. Cache is also used to temporarily store data from the host device that is destined for the mass storage medium. When the host device is saving data, the storage device saves the data in cache at the time the host computer requests a write. The storage device typically notifies the host that the data has been saved, even though the data has been stored in cache only; later, such as during an idle time, the storage device “de-stages” data from cache (i.e., moves the data from cache to mass storage media). Thus, cache is typically divided into a read cache portion and a write cache portion. Data in cache is typically processed on a page basis. The size of a page can vary and is generally implementation specific; a typical page size is 64 kilobytes.
Generally, storage device performance improves as read cache hit rate goes up. Read cache hit rate is a measure of frequency of accessing the read cache rather than the mass media (e.g., a disk). As is generally understood, the mass media typically takes much longer to access than the read cache. Thus, by increasing the read cache hit rate, data input/output (I/O) rate to the host can be increased. In order to take advantage of the relatively faster read cache, typical storage devices attempt to predict what data a host device will request in the near future and have that data available in the cache when the host actually requests it.
Typical storage devices attempt to identify “sequential workloads” during operation in order to predict which data the host will request. A sequential workload is generally a host workload that includes request(s) for data at logical addresses that are substantially sequential. After detecting a sequential workload, the storage device can read ahead in the host address memory space, and pre-fetch sequential data into the read cache. Pre-fetching data involves reading data from the mass storage media before the data is requested from the host and storing that data in the read cache. By reading ahead and pre-fetching data, system performance can be improved, particularly when a host is accessing a relatively large and/or contiguous blocks of data, such as a text or video document.
One problem with existing systems relates to detecting sequential workloads. Typical systems employ processes for detecting sequential workloads that are highly resource consuming. Typically, a number of host requests are stored in memory. Addresses associated with the number of host requests are sorted, typically in numerical order. After sorting, the storage device employs algorithm(s) to identify a sequential pattern in the addresses. A number of sequential pattern recognition algorithms based on sorted addresses are known and used. The memory required to separately store host requests, and the processor time and memory required to sort and identify pattern(s) in the request addresses can result in inefficient use of storage device resources. Any resource (e.g., memory or processor time) that is used to detect a sequential workload, therefore, may not be available for host I/O.
Thus, traditional methods of detecting sequential workloads, and thus, taking advantage of the benefits of read cache pre-fetching, typically utilize existing storage device resources inefficiently.