The invention relates to managing reuse information with multiple translation stages.
Many modern processors support hierarchical cache systems with multiple levels of cache, including one or more levels within the processor or within each core of a multi-core processor, and one or more levels external to the processor or cores, up to a last level cache (LLC) that is accessed just before main memory is accessed. At each level of the hierarchy, the cache stores copies of a subset of data to speed access to that data by the processor relative to the speed of a higher level cache (or relative to the speed of the main memory for the LLC). Lower level caches are closer to the processor (or core), whereas higher level caches are further away from the processor (or core). The LLC is typically shared by all of the cores of a multi-core processor. At each level, the cache system will load blocks of data into entries and evict blocks of data from entries in units of ‘cache lines’ (also called ‘cache blocks’). Each cache line includes a number of ‘words’ of data, each word consisting of a predetermined number of bytes. Each cache entry includes space for storing the data words of a particular cache line along with bits for a tag (which contains a number of the most significant bits of an address, which are common to the words of that entry) and space for other information (e.g., a valid bit and any flags or error correction code bits). For a set associative cache, before comparing a tag portion of a memory address of desired data, the cache system compares an index portion of the address to determine in which of multiple sets the cache line containing that data may be stored. For an N-way set associative cache, the tag comparison is performed N times (possibly in parallel), once for each of N ‘ways’ in which the cache line containing the data may be stored. The lowest order bits of an address (also called a ‘block offset’) are used to select a particular word from a cache line that is found in the cache (i.e., a ‘cache hit’). If the cache line is not found in the cache (i.e., a ‘cache miss’), then the cache system attempts to retrieve the cache line from a higher level cache, or from the main memory (in the case of the LLC).
One issue that may arise in the operation of a computing system that includes a cache is called ‘cache pollution’, where cache lines with lower reusability displace cache lines with higher reusability. Reusability refers to the likelihood that data in a particular cache line will be accessed again after being loaded into the cache and before being evicted. One solution for mitigating this cache pollution problem is the use of a ‘pollute buffer’, which is a portion of the cache used to store cache lines with low reusability, preserving most of the cache for cache lines with high reusability. For example, using ‘page coloring’ a particular portion of a virtual address can be associated with a particular ‘color’ such that virtual addresses with different colors are guaranteed not to overlap in the cache (e.g., by limiting each color to one or more sets of a set associative cache). Page coloring has been used to mitigate cache pollution in some virtualization schemes.