A conventional data processing system may include a processor coupled to a system memory where the processor may be associated with one or more levels of cache. A cache includes a relatively small, high speed memory (“cache memory”) that contains a copy of information from one or more portions of the system memory. Frequently, the cache memory is physically distinct from the system memory. A Level-1 (L1) cache or primary cache may be built into the integrated circuit of the processor. The processor may be associated with additional levels of cache, such as a Level-2 (L2) cache and a Level-3 (L3) cache. These higher level caches, e.g., L2, L3, may be employed to stage data to the L1 cache and typically have progressively larger storage capacities but longer access latencies.
The cache memory may be organized as a collection of spatially mapped, fixed size storage region pools commonly referred to as “congruence classes.” Each of these storage region pools typically comprises one or more storage regions of fixed granularity. These storage regions may be freely associated with any equally granular storage region in the system as long as the storage region spatially maps to a congruence class. The position of the storage region within the pool may be referred to as the “set.” The intersection of each congruence class and set contains a cache line. The size of the storage granule may be referred to as the “cache line size.” A unique tag may be derived from an address of a given storage granule to indicate its residency in a given congruence class and set.
When a processor generates a request of an address of data (a read request) and the requested data resides in its cache memory, e.g., L1 cache memory, then a “cache hit” is said to take place. The processor may then obtain the data from the cache memory without having to access the system memory. If the data is not in the cache memory, then a “cache miss” is said to occur. The memory request may be forwarded to the system memory and the data may subsequently be retrieved from the system memory as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from the system memory may be provided to the processor and may also be written into the cache memory due to the statistical likelihood that this data will be requested again by that processor. Likewise, if a processor generates a write request, the write data may be written to the cache memory without having to access the system memory over the system bus.
As is known to those skilled in the art, a wide variety of cache configurations or organizations are commonly available. For example, a “direct-mapped” cache is organized such that for each addressed location in main memory, there exists one and only one location in a cache data array that could include a copy of such data. In an “n-way set-associative” cache, the cache is configured such that for any one addressed location in main memory, there exists n possible locations within the cache data array that might include a copy of such data.
There have been many methods in designing caches that seek to increase the cache hit rate thereby improving performance of the cache. A “cache hit rate” may refer to the rate at which cache hits occur relative to the total number of accesses that are made to the cache. By improving the cache hit rate, the performance of the system may be improved, i.e., less data needs to be serviced from system memory.
In an “n-way set-associative” cache, one way to improve the performance of the cache is to use a Least Recently Used (LRU) replacement method to assist in determining how data is to be managed in the cache. The LRU replacement method uses a single logical stack construct composed of “n” elements for each of the congruence classes in an n-way set-association cache where each cache entry stores particular data. A congruence class may refer to “n” cache lines (corresponding to the number of ways) whose addresses are a modulo of one another. As stated above, if an item, e.g., data, requested by the processor is present in the cache memory, a “cache hit” is said to occur. When a cache hit occurs, the cache entry comprising the information, e.g., data, requested is considered to become the “most recently used” item in its congruence class and is logically moved from its current location in the stack to the top of the stack. The entry in the congruence class that can logically be viewed as being at the bottom of the stack is the “least recently used” item in the congruence class. As stated above, if an item, e.g., data, requested by the processor is not present in the cache memory, a “cache miss” is said to occur. When a cache miss occurs, the requested item is retrieved from system memory and then stored in the top stack position. When a new entry (cache line) is inserted in the stack, the cache entry (cache line) in the bottom stack position of the stack is evicted. The information, e.g., data, at that entry may subsequently be discarded, or written back to system memory if the cache entry contains a recent update. When there is a cache hit to an entry in the middle of the stack, that entry is moved to the top of the stack. Those entries that are located above the entry requested are each shifted down one position to fill the void left by the entry that moved to the top of the stack.
When using the LRU replacement method briefly described above, a new cache line replaces an old cache line that has not been requested (or what may be referred to as “referenced”) by the processor for the longest time. Some cache lines are only referenced once but may remain in the cache memory waiting for a second reference that may never come. For example, streaming input and output files as well as random references to large table or chain pointers may only be referenced once. By holding to such data in the cache memory, other data or instructions that might be reused may be replaced in the cache memory to make room for the data that is not reused. By replacing data in the cache memory that might be reused to make room for the data that is not reused, the cache hit rate may be reduced and subsequently diminish performance.
If, however, the reuse characteristics of the cache data were detected prior to having reusable data being replaced to make room for the data that is not reused, then data (cache line) that may not be reused may be replaced with the new incoming cache line prior to replacing data (cache line) that may be reused. By replacing data in the cache memory that might not be reused prior to replacing data that might be reused, the cache hit may be improved thereby improving performance.
Therefore, there is a need in the art to detect data that has been reused thereby ensuring that non-reusable data may be replaced prior to reusable data in the LRU replacement method.