The present invention relates in general to data processing systems, and in particular, to the caching of data for use by a processor.
In microprocessor systems, processor cycle time continues to decrease as technology continues to improve. Also, design techniques of speculative execution, deeper pipelines, more execution elements and the like, continue to improve the performance of processing systems. The improved performance puts a heavier burden on the memory interface since the processor demands data and instructions more rapidly from memory. To increase the performance of processing systems, cache memory systems are often implemented.
Processing systems employing cache memories are well known in the art. Cache memories are very high-speed memory devices that increase the speed of a data processing system by making current programs and data available to a processor (xe2x80x9cCPUxe2x80x9d) with a minimal amount of latency. Large on-chip caches (L1 or primary caches) are implemented to help reduce the memory latency, and they are often augmented by larger off-chip caches (L2 or secondary caches).
The primary advantage behind cache memory systems is that by keeping the most frequently accessed instructions and data in the fast cache memory, the average memory access time of the overall processing system will approach the access time of the cache. Although cache memory is only a small fraction of the size of main memory, a large fraction of memory requests are successfully found in the fast cache memory because of the xe2x80x9clocality of referencexe2x80x9d property of programs. This property holds that memory references during any given time interval tend to be confined to a few localized areas of memory.
The basic operation of cache memories is well-known. When the CPU needs to access memory, the cache is examined. If the word addressed by the CPU is found in the cache, it is read from the fast memory. If the word addressed by the CPU is not found in the cache, the main memory is accessed to read the word. A block of words containing the word being accessed is then transferred from main memory to cache memory. In this manner, additional data is transferred to cache (pre-fetched) so that future references to memory will likely find the required words in the fast cache memory.
The average memory access time of the computer system can be improved considerably by use of a cache. The performance of cache memory is frequently measured in terms of a quantity called xe2x80x9chit ratio.xe2x80x9d When the CPU accesses memory and finds the word in cache, a cache xe2x80x9chitxe2x80x9d results. If the word is found not in cache memory but in main memory, a cache xe2x80x9cmissxe2x80x9d results. If the CPU finds the word in cache most of the time, instead of main memory, a high hit ratio results and the average access time is close to the access time of the fast cache memory.
Access patterns described by an I/O request stream can be random or sequential. In a sequential access pattern, access to the file occurs in logically consecutive blocks (e.g., cache lines), whereas random access patterns do not display any regularity in the request stream. The requested blocks in both the random and sequential accesses can be overlapped or disjointed. In the overlapped requests, two or more of the requested blocks are the same, whereas in the disjointed case, no two requests are to the same block. These four access patterns combine together into two distinct categories as follows: (a) overlapped access patterns (both random and sequential) display the property of temporal locality; and (b) sequential access patterns (both overlapped and disjointed) display the property of spatial locality.
Temporal locality means that if a given disk (e.g. hard drive) location is fetched, there is a higher probability that it will be fetched again early in the reference stream rather than later. With temporal locality, the same location is requested two or more times. Spatial locality means that if a given disk location is fetched, there is a higher probability that locations with an address that are close successors (or predecessors) to it will also be fetched, rather than one that is distant. Spatial locality is exhibited by workloads that are sequential in nature and are referred to as sequential I/O request streams. Read lookahead (prefetch) techniques are used to exploit spatial locality. Techniques that exploit temporal locality can be addressed in ways that relate to the data prefetch, and the two are said to be integrated. A cache can be designed take advantage of both temporal and spatial locality.
A primary concern of cache design is the determination of a data block""s potential for future reference. While overlapped access patterns do not indicate an increased likelihood for spatial locality, they do exhibit strong temporal locality; that is, there is no reason to assume a random block will not be referenced again, and likely, sooner than later. In contrast, sequential streams indicate strong locality affinity, but due to the nature of the request stream, demonstrate little likelihood that the block will be referenced again in the stream. Data blocks cached as a result of overlapped access patterns have a higher probability of future reference, whereas data blocks cached as a result of sequential access patterns, have a small probability of future reference outside a tight locality window.
The occurrence of sequential I/O request streams can affect the operation of the cache because the cache can be flooded with unproductive data blocks. If the blocks from the sequential stream are stored in the cache after being fetched from disk storage, it is unlikely that these blocks will be referenced again. Furthermore, in order to store the blocks from the sequential stream in the cache, other blocks have to be evicted in order to create space for these blocks. The problem is that the evicted blocks could have been referenced later, but were replaced by blocks that are unproductive after their initial reference. Therefore, the overall effect of loading blocks from a sequential stream into the cache will likely reduce cache hit ratio. The cache is flooded with data that is stale with respect to future accesses.
Read lookahead techniques can be used to take advantage of the sequential streams in the I/O request stream. However, a chronic problem with prefetch techniques has been that the cache can be flooded with unproductive prefetched blocks and read lookahead can actually reduce the performance of the storage subsystem if the prefetched blocks are never referenced. Furthermore, the prefetched blocks can replace cache blocks that would have otherwise been referenced had these remained resident in the cache. Therefore, prefetching does not always improve performance and can actually make the condition of cache flooding in the cache more serious. The cache is flooded with data that is said to be purely speculative with respect to future accesses.
To take the advantage of temporal locality, the retrieved blocks, or lines, of data are time-stamped and organized within the cache in a least recently used (xe2x80x9cLRUxe2x80x9d) order. However, a conflict occurs since LRU-based caches are not as effective for taking advantage of spatial locality. Instead, read lookahead (prefetching) techniques are preferred for spatial locality. But, as noted above, such techniques can evict needed data from the cache.
The present invention describes a method that alleviates cache flooding when handling sequential I/O request streams in storage subsystem caches. Both on-demand and prefetched data are addressed. A mechanism is provided that makes the blocks read from the disk array, that are part of a sequential I/O request stream, minimize the evictions of potentially useful data blocks from the cache, thus avoiding the flooding of the cache by stale data.
The system comprises a cache, including the cache directory, and a sequential I/O request stream tracer, which serves as a secondary directory subservient to the cache directory. As a tracer, the secondary directory (Sequential Stream Tracer) in this design may not contain any associated data, except for the data in the cache with which the Sequential Stream Tracer is associated through the main directory. The Sequential Stream Tracer may consist of references to the cache directory entries that correspond to data blocks constituting sequential access patterns. In this manner, the tracer merely tracks the sequential stream data blocks, providing a single service point for controlling cache floods. The tracer may be organized as an array of multiple least recently used (LRU) stacks, where each LRU stack corresponds to a sequential access pattern. The tracer table maintains a number of LRU stacks responsible for storing primary cache directory entries that in turn point to the most recent data blocks fetched from disk to satisfy that sequential stream""s requests (including those blocks, if any, that are fetched as a result of native prefetch algorithms).
The I/O requests from each stream do not normally occur in consecutive positions in the request target, but are instead interleaved with the entries from other sequential streams. These multiple streams may be detected and the individual requests stored in a different tracer entry. The larger the run length of the sequential request stream associated with the I/O request target, the fewer number of individual transactions may be handled by the request. Following this argument, one can also determine that for a sequential stream with large run lengths, there can only be a smaller number of sequential streams appearing in the request stream. On the other hand, one can also determine that for a sequential stream with small run lengths or no sequential streams present, there can be a relatively larger number of sequential streams and non-sequential requests appearing to the request target. This is even more true when compared with a purely random request stream (the average run length is 0).
One advantage of the tracer design of the present invention is a low cost solution requiring a small number of buffers (as required to detect the beginning of new sequential streams in the request stream and in the directory and to store these streams).
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.