The explosive growth of the Internet has ushered in a new era in which information is exchanged and accessed on a constant basis. In response to this growth, there has been an increase in the size of resources that are being shared. Users are demanding more than standard HTML documents, wanting access to a variety of resources, such as, audio data, video data, image data, and programming data. Thus there is a need for resource storage that can accommodate large sets of resources, while at the same time providing fast and reliable access to the resources.
Accessing resource data stored in file systems has historically been a bottleneck for computer systems. Processor speeds, memory sizes, and network speeds have greatly increased, but disk I/O (Input/Output) performance has not increased at the same rate, making disk I/O operations inefficient, especially for large resource files.
One response to this problem has been to prefetch portions of resource files before they are requested and to store them in a cache. Because the cache I/O performance is much better than that of the disk drive, the portions of the resource files stored in the cache can be accessed much faster than if they resided on the disk. Accordingly, disk caching can lead to improvements in file system performance and throughput.
However, this response to the problem raises the question of how to decide which portions of the resource to prefetch. One answer to this question has been to adopt a simple read-ahead protocol in which a prefetch instruction is issued for a fixed number of blocks of data, such as ten or twenty, stored ahead of the block requested by a user or client application.
The simple read-ahead approach suffers from numerous problems. Special code is needed to issue prefetches when reading of a file commences. If reading commences at a new position in the file and then proceeds sequentially, blocks of the file ahead of the new position will not have not been prefetched and will be read slowly. If files are randomly accessed, the system wastes resources by prefetching blocks that will never be used. Finally, the number of blocks prefetched is typically fixed and independent of the read speed of the disk. Therefore, too many blocks are prefetched when the read speed is slow, which wastes cache space, and conversely too few blocks are prefetched when the read speed is high, which degrades performance and throughput.