A cache may be arranged to store data and/or instructions fetched from a memory so that they are subsequently readily accessible by a device having access to that cache, for example a processing unit with which the cache may be associated. Hereafter, the term “data value” will be used to refer generically to either instructions or data, unless it is clear from the context that only a single variant (i.e. instructions or data) is being referred to. However, it is envisaged that the techniques of embodiments of the present invention will find more general applicability when used in association with data rather than instructions.
A cache typically has a plurality of cache lines, with each cache line being able to store typically a plurality of data values. When a processing unit wishes to have access (either read or write) to a data value which is not stored in the cache (referred to as a cache miss), then this typically results in a linefill process, during which a cache line's worth of data values is stored in the cache, that cache line including the data value to be accessed. Often it is necessary as an initial part of the linefill process to evict a cache line's worth of data values from the cache to make room for the new cache line of data. Should a data value in the cache line being evicted have been altered, then it is usual to ensure that the altered data value is re-written to memory, either at the time the data value is altered, or as part of the above-mentioned eviction process.
Each cache line typically has a valid flag associated therewith, and when a cache line is evicted from the cache, it is then marked as invalid. Further, when evicting a cache line, it is normal to assess whether that cache line is “clean” (i.e. whether the data values therein are already stored in memory, in which case the line is clean, or whether one or more of those data values is more up to date than the value stored in memory, in which case that cache line is not clean, also referred to as dirty). If the cache line is dirty, then on eviction that cache line will be cleaned, during which process at least any data values in the cache line that are more up to date than the corresponding values in memory will be re-written to memory. Typically the entire cache line is written back to memory.
In addition to cleaning and/or invalidating cache lines in a cache during a standard eviction process resulting from a cache miss, there are other scenarios where is it generally useful to be able to clean and/or invalidate a line from a cache in order to ensure correct behaviour; often software accessible cache maintenance operations are added to provide this capability. For example, such a process may be used during software managed inter-processor cache coherency (i.e. where no cache coherency hardware is available), which serves to ensure that in a system where there are multiple processors, each having access to their own caches, then each cache stores the most up-to-date version of a data value. If one processor updates the data value in its associated cache, and if another cache is already storing a copy of that data value, then the cache coherency technique may be used to either invalidate that copy, given that it is now out of date, or to cause that copy to be updated to reflect the most up-to-date value. Another situation where it is useful to be able to clean and/or invalidate a cache line from a cache in order to ensure correct behaviour is where page table descriptors are changed, which will typically result in the contents of a cache being flushed to memory with each data value being cleaned as required. A further example is when employing power management techniques, for example where a processor is about to enter a low power mode, and any data in its associated cache must be saved to another level in the memory hierarchy given that that cache will lose its data when entering the low power mode.
From the above discussions, it will be appreciated that it is common to provide cache maintenance operations to allow the cleaning and/or invalidation of lines from a cache as and when required to ensure correct operation of the cache. Often these cache maintenance operations are provided as privileged operations accessible only from supervisor (OS) code. This is particularly the case with operations that could have adverse side effects, for example invalidating a cache line without cleaning it can cause inconsistency if not handled very carefully.
The aforementioned cache maintenance operations generally instigate the requested management on the cache line immediately. If a large block of memory affecting a number of cache lines is being processed, then this can result in a significant amount of memory traffic, likely causing the associated processing unit's pipeline to back up or stall in the process.
In addition to using cleaning and/or invalidating processes to ensure correct behaviour, it would be desirable to be able to also use such processes to improve overall performance of a data processing apparatus by ensuring better cache utilisation. However, the above two aspects of conventional techniques, namely the privileged access and the immediacy of effect, result in conventional cache maintenance operations being less than ideal for such purposes.
The article “Using the Compiler to Improve Cache Replacement Decisions” by Z. Wang et al, Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 02), describes a compiler mechanism that guides cache replacements by selectively predicting when data will or will not be reused, with the aim of improving replacement decisions in set-associative caches, thereby improving memory system performance. In accordance with the implementation described, a single tag bit called the evict-me bit is provided per cache line, and then in the instruction set architecture a new set of memory instructions are provided which set the evict-me tags and are otherwise the same as an original set of memory instructions. The embodiment uses five extra bits in each memory instruction that the compiler sets to resolve run time spatial locality. An alternative hardware implementation uses a new instruction to store the five-bit constant into a special register, and the following memory operations then access the special register and constant to detect spatial reuse, and then set the relevant evict-me bit accordingly.
The article “Cooperative Caching with Keep-Me and Evict-Me” by J. B. Sartor et al, The Ninth Annual Workshop on Interactions between Compilers and Computer Architectures, San Francisco, Calif., February 2005, discusses a cooperative caching technique that seeks to improve memory system performance by using compiler locality hints to assist hardware cache decisions. In accordance with this technique, the compiler suggests cache lines to keep or evict in set-associative caches. A compiler analysis predicts data that will be and will not be reused, and annotates the corresponding memory operations with a keep-me or evict-me hint. The architecture maintains these hints on a cache line and only acts on them on a cache miss. In particular, this paper discusses a keep-me caching policy, which retains keep-me lines if possible. Otherwise the default replacement algorithm evicts the least-recently-used (LRU) line in the set. The paper describes the use of a keep-me hint, the associated compiler analysis, and architectural support. The paper also discusses the use of special keep-me and evict-me instructions with additional bits for spatial, temporal, and counter tags.
Both of the above papers hence describe techniques where compiler analysis is used to predict data that will be or will not be reused, with modified instructions then being used to set the required keep-me or evict-me bits associated with cache lines, which can then be referenced when seeking to select a candidate cache line for eviction.
Whilst the above-described techniques can achieve better utilisation of available cache storage by preferentially evicting data perceived to be of less importance, the techniques require compiler analysis and use of particular instructions within the code to provide the required keep-me and evict-me hints. It would be desirable to provide a more automated technique for achieving better utilisation of available cache storage, which would not require any modification to the program code.