The invention relates generally to data prefetching operations in data storage systems.
A typical data prefetching operation makes two important decisions. First, it determines when a prefetch task should be initiated. Second, it determines how much data the prefetch task should prefetch from storage. One known approach to prefetching determines that a prefetch task begin when a sequence of a certain length (i.e., a sequence satisfying a predetermined tail parameter) is observed. Once prefetch activity has commenced, it attempts to remain ahead of the requesting host by a margin that is based on the number of prefetched tracks that are actually used by the host.
Such an approach is not well suited to handling short sequences, however. Because a short sequence has a very short lifetime, prefetch activity for a short sequence cannot afford to wait for a sequence to be formed. Rather, to be effective in those instances, it needs to begin early in the sequence.
In one aspect of the invention, prefetching data from a storage device includes maintaining a history of sequences and determining an amount of data to be prefetched from a storage device for a new I/O request using the history of the sequences.
Embodiments of the invention may include one or more of the following features.
The history of sequences can comprise at least one histogram having n count fields, each for storing a count value for a corresponding sequence length in a range of 1 track to n tracks and the count value indicating a number of occurrences of sequences of the corresponding sequence length. There can be one histogram per logical volume.
Maintaining the histogram can include observing completion of a sequence of a given sequence length and incrementing the count value in any of the count fields for which the corresponding sequence length is less than or equal to the given sequence length.
Determining the amount of data to be prefetched can include predicting that a current sequence of a current sequence length will reach a next sequence length by computing a probability as a ratio of the count value for the corresponding sequence length that equals the next consecutive sequence length and count value for the corresponding sequence length that equals the current sequence length. It can further include applying a threshold to the prediction. Applying the threshold to the prediction can include comparing the threshold to the prediction determining if the probability is less than the threshold. The prediction and threshold application are repeated for each next sequence length until it is determined for such next sequence length that the probability is less than the threshold. A prefetch amount equal to such next sequence length minus the current sequence length is returned when the results of the comparison indicate that the probability is less than the threshold.
The threshold can be adjusted based on system activity metrics. The system activity metrics can include processor utilization and average memory access time.
The value of xe2x80x988xe2x80x99 can be selected for n.
One or more aspects of the invention may include one or more of the following advantages. Unlike prior prefetch mechanisms that wait to see a sequence of a predetermined length before creating a prefetch task, the prefetch mechanism of the present invention enables a prefetch task to begin as soon as a new read request arrives, thus providing for higher cache hit ratios and read response times for short sequences. In addition, the prefetch mechanism adjusts itself with changing system activity levels (or load conditions) so that prefetching is as aggressive as possible without having an adverse impact on overall system performance.
Other features and advantages of the invention will be apparent from the following detailed description and from the claims.