Random-access MSD's, such as optical or magnetic disk drives, and other storage and file subsystems characterized by slow data access times, frequently have at least one associated cache which stores portions of data retrieved from the storage systems. The slow access times, which may be in the range of 500 microseconds to 500 milliseconds, allow the cache to enhance the performance of the MSD for applications which require that the data retrieved from the MSD be frequently re-used. The cache stores the frequently used data so that the MSD does not have to repeatedly retrieve it from the storage system using time consuming techniques.
The data which is retrieved from the MSD and stored in a cache is generally data which is requested by one or more processes. The processes may have request referencing patterns which let the cache enhance the performance of the MSD, or the processes may have referencing patterns which prevent the cache from benefitting the storage system at all. In practice, those processes which have referencing patterns which do not permit effective cache utilization tend to degrade the performance of the cache for other processes.
The performance of a cache is measured by the percentage of the total requests that can be satisfied by the cache. The use of a cache eliminates the slow mechanical operations associated with re-acquiring or repeatedly retrieving the requested data from the storage system. Process referencing patterns allow the cache to perform well if the process repeatedly requests data from the same MSD locations so that the data is found in the cache on subsequent references. The cache performs poorly if the processes request data from distinct MSD locations only once or infrequently.
If a cache has a storage capacity which is smaller than a sequence of references to distinct data items requested by a process, all of the previous contents of the cache can be pushed out of the cache. Furthermore, this flooding effect reduces the effectiveness of the cache for other processes whenever long sequential references occur which replace a substantial fraction of the cache storage capacity.
One example of such a long sequential reference to an MSD such as a magnetic disk drive is a disk backup, which effectively makes one long sequential reference to the entire disk. A conventional cache will be flooded by this useless data. Typically this problem is solved by modifying the backup process to bypass the cache. However, only the worst and most predictably pathological processes can be dealt with in this way. Therefore, it would be desirable to compensate for long sequential references without requiring that they be identified in advance or that pathological processes be modified.
The least recently used (LRU), least frequently used (LFU), and first-in, first-out (FIFO) replacement algorithms have been used to sort data in the cache to enhance its performance with process request referencing patterns. The LRU replacement algorithm works by organizing the data in the cache in a list of data blocks which is sorted according to the length of time since the most recent reference to each data block. The most recently used (MRU) data is at one end of the list, while the least recently used (LRU) data is at the other. New data is added to the MRU end of the list. When data is to be discarded by the cache for accommodating the receipt of new data, the discarded data comes from the LRU end of the list. However, the LRU algorithm does not eliminate the long sequential reference problem.
The LFU replacement algorithm works by sorting the data in the cache according to the number of times that it has been used. The LFU algorithm organizes the cache data in a list, with the most :frequently used (MFU) data at one end of the list while the least frequently used (LFU) data is at the other. New data which has only been used once is added to the LFU end of the list. Although the LFU algorithm is immune to the long sequential reference problem, it has other drawbacks. With the pure LFU algorithm, there is no way for data which was once heavily used to leave the cache when it is no longer needed. The overhead of computing the LFU order is higher than LRU. When an aging scheme is implemented to remove old data from the cache using the LFU algorithm, the overhead is much higher than with the LRU replacement algorithm.
The first-in, first-out (FIFO) replacement algorithm is the simplest, sorting the data in the order of the first reference that missed, falling to find the data in the cache. The FIFO algorithm organizes the cache data in a list, with the most recently missed (MRM) data at one end of the list (MRU), and the least recently missed (LRM) data at the other (LRU). When data is to be discarded by the cache for accommodating the receipt of new data, the discarded data comes from the LRM end of the list. FIFO works well when data is likely to be used several times close together in time, however, it does not serve to distinguish data that might be worth keeping for a longer time. Furthermore, the FIFO algorithm is most susceptible to the long sequential reference problem, since the higher the rate of data passing through the cache, the shorter the time available for its reuse.