In this description and claims, a “working set” is abstractly defined to be a collection of instructions and/or associated data. For example, a working set could describe all instructions and data allocated by a particular process or thread, a particular data structure within a process's address space, or a thread's or process's most frequently accessed subset of instructions and data. A working set may belong, for example, to any of the following entities: a process, thread or fiber, or an application or service composed of multiple processes. In this description and claims, “entity” is defined as the container or owner of the working set.
As data and instructions are required by the processor of a computer, they are transferred from the main memory of the computer to the processor. The latency inherent in obtaining items from the main memory may be quite large. A cache is memory that is smaller and accessed more quickly by the processor than the main memory. The processor cache may be located on the chip with the processor, on the processor socket or elsewhere. A page is the unit of memory that is used in main memory management and allocation. Each page is composed of several cache lines, also known as “cache blocks”, which are the units used in cache memory management. A failed attempt by the processor to access an item in its cache, known as a “cache miss”, causes the item to be accessed from the main memory, which adds latency.
Applications running on the computer describe the location of data and instructions using virtual addresses that refer to a virtual address space. The operating system maps or translates the virtual addresses to corresponding physical addresses as needed. If the processor cache is a physically-indexed cache, a portion of the physical address, known as the cache index bits, is used to determine where the item will be copied to in the cache. Therefore, translation or mapping of a virtual address to a physical address by the Operating System implicitly selects the location in the cache where the page will be stored. Since the processor cache is smaller than the main memory and only a portion of the physical address is used as the cache index bits, several physical addresses will map to the same location in the cache.
Most processor caches are N-way set-associative caches. In an N-way set-associative cache, a page whose physical address has a given value of cache index bits can be stored in any of N locations in the cache. Likewise, a portion of the page corresponding to a single cache line can be stored in any of N locations in the cache. If all N cache lines in a set are occupied and the processor needs to write another item to that set, then one of the cache lines must be evicted to make room for the other item.
A cache replacement policy will determine which of the cache lines are evicted. Two common cache replacement policies are random cache replacement and Least Recently Used (LRU) or pseudo-LRU cache replacement. In a random cache replacement policy, cached items are randomly selected for eviction. In an LRU cache replacement policy, a cached item is selected for eviction when it has been unused for the longest time compared to other cached items in the set. Modified LRU policies have been proposed so that lines that exhibit temporal locality, i.e. that have a high probability of being reused in the near future, are not replaced as readily as those that do not appear to exhibit temporal locality. However, in general, hardware-only cache replacement algorithms are restricted by the hardware's limited view of workload behavior.
It has been proposed to allow software to include cache management instructions, namely prefetch, allocate and victimize instructions. A victimize instruction may be used to preempt the hardware's cache replacement policy by selecting a particular cached item for eviction notwithstanding the cache replacement policy. However, if the same software is to be run on different hardware having different cache sizes and/or configurations, it may be difficult for the software programmer to anticipate what cache management instructions to use in each case.