Caching is a well-known technique for improving computer performance and different types of caches are found in almost every modern computer. A processor, that is, CPU cache is usually a relatively small but fast hardware memory structure in which copies of frequently needed information (instructions and data) are stored so as to be more readily accessible. Traditionally, computer processors have employed a simple mapping from physical memory addresses to processor cache sets, in which the low-order bits of the physical page number are used as the high-order bits of the cache set index. The term “page” generally refers to a contiguous, aligned region of memory, and is typically used as a unit for address translation and memory management. For example, systems having the x86 architecture commonly use 4 KB pages.
A hardware “cache set” contains space for caching a limited number of memory units, typically referred to as cache “lines”. For example, on modern x86 hardware, the cache line granularity is 64 bytes, with 64-byte alignment. On the Intel Sandy Bridge x86 processor, a single last-level cache set consists of 20 lines; i.e., the cache is 20-way set associative.
The traditional, straightforward hardware mapping of physical pages to cache sets has been leveraged for many years by operating systems and hypervisors, using a well-known technique known as “page coloring”. Pages are partitioned into disjoint sets called “colors”, such that pages with different colors do not conflict in the cache. A page's color can be computed trivially from its physical address, for example, using a simple shift-and-mask technique. Page coloring has been used in many systems to improve performance by reducing cache conflict misses and to control the isolation or sharing of cache memory between software contexts.
However, some recent processors, such as those based on the Intel Sandy Bridge (SNB) and Ivy Bridge (IVB) x86 micro-architectures, now use “complex cache indexing” to map physical addresses to cache sets in the processor's last level cache (LLC). The hardware that realizes this mapping can be implemented using an arbitrarily complicated, undocumented, proprietary hash function that may potentially use any of the bits in the physical memory address to index into the cache. As a result, small contiguous memory regions may be scattered across many discontiguous sets throughout the cache, and traditional page coloring techniques may no longer work. The mapping function may also vary across different processor implementations or configurations, even within the same processor family.
It would therefore be advantageous to have an automated software-based approach that can partition memory units (such as pages or lines) into sets, such that units in different sets do not contend for the same limited space within the processor cache. Preferably, this method should work even for processors that employ opaque complex cache indexing to map physical addresses to their corresponding cache sets. Such a capability would enable software, including operating systems and hypervisors, to manage or eliminate cache conflict misses by consulting this partitioning when making memory management decisions, enjoying benefits similar to traditional page coloring. This capability is especially useful in the context of a software cryptoprocessor system, such as the vCage system provided by PrivateCore, Inc., in which the ability to control cache residency and prevent evictions helps maintain confidentiality and integrity.