1. Field of the Invention
The present invention relates, in general, to a mass prefetching method for a disk array, which can improve the performance of sequential or non-sequential reads that exhibit spatial locality by using an online disk simulation that investigates whether a prefetching based on strip or stripe of a disk array is beneficial in terms of read service time.
2. Description of the Related Art
Prefetching has been necessitated to reduce or hide the latency between a processor and a main memory as well as between a main memory and a storage subsystem that consists of disks. Some prefetching schemes for processors can be applied to prefetching for disks by means of a slight modification. However, many prefetching techniques that are dedicated to disks have been studied.
The frequently-addressed goal of disk prefetching is to make data available in a cache before the data is consumed; in this way, computational operations are overlapped with the transfer of data from the disk. The other goal is to enhance the disk throughput by aggregating multiple contiguous blocks as a single request. Prefetching schemes for a single disk may give arise to some problems in striped disk arrays. Therefore, there is now a need for a special scheme for multiple disks, in which the characteristics of striped disk arrays must be considered.
Conventional prefetching technologies include offline prefetching, history-based online prefetching, prefetching using hints of application programs, and sequential prefetching. Currently, conventional prefetching schemes except for sequential prefetching are not used in actual systems due to their high overhead and low benefit.
The traditional prefetching schemes ignore the data placement of striped disk arrays and therefore suffer from independency loss for concurrent multiple reads. The prefetching requests of traditional prefetching schemes are not aligned in the strip. Hence, the requests may be split across several disks and then each disk requires much more accesses. We call this problem independency loss. If each prefetching request is dedicated to only one disk, the independency loss is resolved.
If the prefetching size is much less than the stripe size and the number of concurrent sequential reads is much less than the number of the member disks that compose a striped disk array, some of disks become idle, thereby losing parallelism of the disks. This case exemplifies what we call parallelism loss. A large prefetching size laid across multiple disks can prevent the parallelism loss. However, if the independency loss is to be resolved, the prefetching request must be aligned in the strip and its size must be adequate to prevent prefetching wastage. Such a prefetching size is much less than the stripe size, and, as a result, suffers from parallelism loss for a small number of concurrent reads. The two problems, independency loss and parallelism loss, conflict with each other, if problem is resolved, the other problem arises.