Many computer systems available today have cache memory.
Cache memories are high speed memory systems that store a partial copy of the contents of a larger, slower, memory system. In addition to storage, known herein as cache data memory, cache memory systems also have mapping apparatus for identifying those portions of the larger, slower, memory system held in cache, and mapping those portions into corresponding locations in the cache. This mapping apparatus often takes the form of a cache tag memory.
Many modern computer systems implement a hierarchy of cache memory systems. Many common processors, including Intel Pentium-II and Pentium-III circuits, have two levels of cache. Systems have been built implementing three, or even four, levels of cache memory. For purposes of this document, a low level of cache is relatively closer in hierarchy to the processor than a high level of cache, and a high level of cache is relatively closer in the hierarchy to main memory.
Cache memories typically have separate cache tag memories for each level of cache. In a three level cache memory, there are typically separate tag memories for each level of the cache.
These cache systems have cache tag memory subsystems and cache data memory subsystems. Each cache data memory typically operates on units of data of a predetermined size, known as a cache line. The size of a cache line is often different for each level in a multilevel cache system; typically being larger for higher levels of cache. Typically, the size of the cache data memory is also larger for higher levels of cache.
In typical cache memory systems, when a memory location at a particular main-memory address is to be read, a cache-line set address is derived from part of the main-memory address. The cache-line set address is typically presented to the cache tag memory and to the cache data memory; and a read operation done on both memories.
Cache tag memory typically contains one or more address tag fields. Each address tag field is compared to part or all of a main memory address to determine whether any part of data read from the cache data memory corresponds to data at the desired main-memory address. If the tag indicates that the desired data is in the cache data memory, that data is presented to the processor and next lower-level cache; if not, then the read operation is passed up to the next higher-level cache. If there is no higher-level cache, the read operation is passed to main memory.
Many caches are of the “set associative” type. In set associative caches, a “set” is a group of cache lines within a cache that share the same “set address”, the portion of cache line address presented to both the cache data memory and the cache tag memory. Each cache line within the set typically has a separate address tag associated with it. In addition to a set address, locating data in cache typically also requires a word-in-cache-line address. N-way, set-associative, caches have N cache lines located at the same set address, and typically perform N comparisons of address tag fields to portions of the desired data address simultaneously. Each cache line has an associated way number.
Typically, a tag memory contains status information as well as data information. This status information may include “dirty” flags that indicate whether information in the cache has been written to but not yet updated in higher-level memory, and “valid” flags indicating that information in the cache is a valid copy of information in higher levels of the memory system.
A cache “hit” occurs whenever a memory access to the cache occurs and the cache system finds, through inspecting its tag memory, that the requested data is present and valid in the cache. A cache “miss” occurs whenever a memory access to the cache occurs and the cache system finds, through inspecting its tag memory, that the requested data is not present and valid in the cache.
When a cache “miss” occurs in a low level cache of a typical multilevel cache system, the main-memory address is typically passed up to the next level of cache, where it is checked in the higher-level cache tag memory in order to determine if there is a “hit” or a “miss” at that higher level. When a cache “miss” occurs at the highest level of cache, a memory reference is performed in main memory.
Since access times generally are greater at higher levels of a multilevel memory system, it is desirable that the “hit rate,” the ratio of cache “hits” to cache “misses,” be high in a system.
A cache “eviction” occurs whenever data in a cache is discarded to make room for data newly fetched from higher level cache or main memory. Since the discarded, or evicted, data is no longer in the cache, future references to the evicted data will result in a cache miss. Computer systems having frequent cache misses to recently evicted data, causing a low hit rate, are described as thrashing the cache.
Since a cache memory is smaller than higher level cache or main memory, multiple portions of higher level will map to each cache line location. When many of these multiple locations are frequently accessed, cache thrashing may occur at that cache line location.
Cache thrashing can be controlled by designing cache system hardware with a high number of ways of associativity. When the number of ways is greater than the number of frequently accessed locations of memory mapping to each cache line location, cache thrashing is less likely than when the number of ways is lower. Increasing the number of ways is expensive, since separate tag comparators are required for each way, and requires redesign of the memory system hardware.
A simple but common cache design derives cache line set addresses of length M bits from memory addresses of length L bits by extracting a group of M address bits from the memory address. Caches of this type, herein known as direct-mapped caches, have advantage in that fewer bits of address tag are required than with certain other cache architectures. It has been observed that large, page-aligned, dynamically allocated memory blocks have a significant likelihood of having hot spots that map to the same locations in cache systems of this type. The larger the page or block size, the more likely hot spots in each block are to map to the same addresses and induce cache thrashing.
Hot spots in each block are most likely to map to the same set address and cause thrashing when block sizes are large, and are particularly likely to map to the same set address when block sizes are a multiple of the cache size divided by the number of ways of associativity. Cache thrashing may result at the hot sets in the cache where hot spots in multiple blocks are mapped.
It is known that the likelihood of cache thrashing in systems of this type can be reduced by modifying hardware such that cache line addresses are derived, through a more complex algorithm, from a greater number of bits of the memory address. For example, a group of M high-order memory address bits may be XOR-ed with a group of M lower-order bits to generate an M-bit set address. Again, avoidance of cache thrashing in this way requires redesign of the memory system hardware.
Memory is dynamically allocated by a dynamic memory management module incorporated into many operating systems, such as Microsoft Windows, Linux, and Unix. System and application programs running on these systems typically may request that a block of a requested size be allocated for their use, the operating system allocates the block and returns a starting address of the allocated block to the requesting program. Application software, such as database software, may also incorporate a dynamic memory management module. Some application programs may superimpose their own dynamic memory allocation schemes upon an operating system dynamic memory allocation system.
Many systems also provide for garbage collection. Garbage collection is a mechanism for consolidating unused memory space, such as previously allocated memory blocks that have been released, into larger blocks. These larger memory blocks can then be allocated when large blocks are requested by system and application programs. Garbage collection may involve relocating used memory blocks within memory such that unused memory space between can be consolidated for reuse.
Dynamically allocated memory associated with a process often may include more than one block of more than one type. A process may, for example, be allocated a stack frame as well as one or more data blocks. Dynamically allocated memory is often accessed through a translation lookaside buffer (TLB).
Many computer systems have dynamic memory allocation software that allocates memory blocks such that blocks start at locations that are aligned to pages, that is each memory block begins at a location that is an even multiple of a page size. The page size is typically an even power of two.
Some programs are known that request dynamically allocated memory in large block sizes; Oracle database software is known to allocate memory in block sizes as large as four megabytes.
A “hot spot” in a memory block is a set of memory locations in the block that are frequently accessed. Should multiple hot spots in multiple memory blocks map to the same cache location, cache thrashing can occur. Hot spots in memory blocks may result in many ways, for example a database program may store index information at the start of each block, where the index information is accessed more frequently than individual data records at other locations within the block. The frequently accessed index information may produce a hot spot in the memory block.
Redesign of memory system hardware is expensive, time consuming, and can be accomplished only by hardware manufacturers. It is desirable to prevent cache thrashing in a way that can be implemented on existing hardware.