Storage systems are one type of auxiliary computing storage devices where each system includes a large number of disk drive units. Large enterprise-level storage systems must have relatively high performance characteristics to meet the performance of high-performing application servers, file servers and database systems supported by the storage systems. In these storage systems, reading data from and writing data to the disk drive units are fairly time-consuming because of the lengthy mechanical operations within the disk drives. Examples of these operations include the arm movement in a disk drive or the rotational delay of the disk associated with getting a read/write head into a reading position or a writing position. To provide fast access to frequently accessed data, cache memories are typically used in the storage systems to temporarily hold this data. Since the read latency for data from a cache memory is less than that for a disk drive unit, the presence of the cache memory significantly improves the overall throughput of a storage system.
To further reduce the read latency, storage systems also use prestage operations to retrieve data from a disk drive into a cache before the data is retrieved by the next host I/O request. This can be done by the host issuing a prestage command, such as the extent channel command used in mainframe programs to indicate that a sequential access will take place, to the storage system to move data to cache. Alternatively, the storage system anticipates the next host I/O request and retrieves the data without any special hint or command from host application.
Since a cache memory has a much higher cost per byte than the data storage, its size is significantly smaller than the total storage. Resource management of the cache is typically done through a Least-Recently-Used (LRU) algorithm. The time duration since the last use of the data in the cache is an indication of its frequency of use. Data stored in the cache memory is aged from the point of time of its last use. Due to the limited capacity of the cache, data is continuously removed from the cache's address space as it becomes the least recently used data. While infrequently accessed data periodically enters the cache, it will tend to age and fall out of cache under the Least-Recently-Used algorithm.
Prior art prestage algorithms assume that data that is sequentially accessed is likely to be accessed in close temporal order or vice versa. In current storage systems, prestage algorithms use metadata to identify a perfect sequentially access I/O pattern to signal to the systems that a prestage operation is necessary. As an example, assume that LBAx, LBAx+y, LBAx+2y, LBAx+3y and so on are contiguous chunks of logical block addresses (LBAs) on a disk drive unit. There are different ways how the host can access the data, which affect the prestage algorithms as a result.
If the order of accesses is LBAX, LBAx+y, LBAx+2y and LBAx+3y, the storage system will prestage LBAx+2y and LBAx+3y only after LBAX, LBAx+y and more perfectly sequential data are accessed one following the other.
If the order of accesses is LBAx, LBAx+2y, LBAx+y and LBAx+3y, the storage system will fail to prestage LBAx+3y after LBAx is accessed.
If the order of accesses is LBAx, LBAx+2y, LBAx+4y, LBA+6y and so on, the storage system will also fail to issue any prestage in the region at all.
The size of the metadata associated with perfectly sequential access data is limited to the size of the memory in the storage systems. This configuration limits the overall accuracy and comprehensiveness of the recorded metadata. Since the gap between the size of the metadata memory in the storage system and its total storage capacity is growing rapidly, the recorded metadata does not accurately represent the I/O behavior.
U.S. Pat. No. 6,44,697 describes a method for prestaging data into a cache to prepare for data transfer operations. The method determines addressable locations of the data to be cached and generates a data structure capable of indicating contiguous and non-contiguous addressable locations in the storage system. A prestage command then causes the data at the addressable locations in the data structure to be prestaged into the cache. This prestage method does not take into consideration relative changes in the data access frequency and relative improvements in previous prestage operations.
U.S. Pat. No. 6,260,115 describes a method for prestaging data in a storage system by detecting a sequential access pattern and then prestaging a number of data tracks ahead of the current request based on the available storage. Data accesses are maintained in a list in the most-recently-used order from which sequential access patterns are detected. A key disadvantage of this method is that its benefits are realized only if there is a perfect sequentiality in the I/O stream.
Therefore, there remains a need for a storage system and method for efficiently prestaging data without the drawbacks of the prior art methods described above.