As main memories continue to grow larger and processors run faster, the disparity in their operating speeds has widened. As a result, a cache that bridges the gap by storing a portion of main memory in a smaller and faster structure has become increasingly important. When the processor core needs data, it first checks the cache. If the cache presently contains the requested data (a cache hit), it can be retrieved far faster than resorting to main memory (a cache miss).
There are often multiple caches between the processing core and main memory in what is referred to as a memory hierarchy. Referring to FIG. 1A, a generic two level cache architecture 10 is shown. A processing core 12 communicates with a level one (L1) cache 14 which in turn communicates with a level two (L2) cache 16. The L2 cache 16 communicates with main memory 18. Hierarchies including even a third (L3) cache are not uncommon. The hierarchy levels nearest the processing core 12 are the fastest, but store the least amount of data.
In a typical 32-bit system, each individual 32-bit address refers to a single byte of memory. Many 32-bit processors access memory one word at a time, where a word is equal to four bytes. Caches usually store data in groups of words called cache lines. For illustrative purposes, consider an exemplary cache having eight words per cache line. The four addressable bytes in each word require that the two least significant bits (22=4 bytes in each word) in the 32-bit address select a particular byte from a word. With eight words in a cache line, the next three least significant bits (23=8 words in each line) in the address select a word from a given cache line.
A cache contains storage space for a limited number of cache lines. The cache controller must therefore decide which lines of memory are to be stored in the cache and where they are to be placed. In the most straightforward placement method, direct mapping, there is only one location in the cache where a given line of memory may be stored. In a two-way set-associative cache, there are two locations in the cache where a given line of memory may be stored. Similarly, in an n-way set associative cache, there are n locations in the cache where a specific line of memory may be stored. In the extreme case, n is equal to the number of lines in the cache, the cache is referred to as fully associative, and a line of memory may be stored in any location within the cache.
Direct mapping generally uses the low order bits of the address to select the cache location in which to store the memory line. For instance, if there are 2k cache lines, k low order address bits determine which cache location to store the data from the memory line into. These k address bits are often referred to as the index. Because many memory lines map to the same cache location, the cache must also store an address tag to signify which memory line is currently stored at that cache location.
Returning to the exemplary eight word per line cache, assume that the cache contains 4096 (212) cache lines. This configuration will result in a cache size of 128 KB (212 lines*23 words per line*22 bytes per word=217 bytes). With 212 cache lines, the twelve low order address bits will be used to decide which location in the cache a memory line will be stored at. The 32-bit address space of the memory can accommodate 232 bytes (4 GB), or 227 cache lines. This means that there are 32,768 (227/212=215) memory lines that map to each cache location. A tag field must thus be included for each cache location to determine which of the 215 memory lines is currently stored.
The five least significant bits in the address select a byte from a cache line. Three bits select a word from the cache line, and two bits select a byte within the word. Twelve bits form the index to select one of the 212 cache lines from the cache. The fifteen-bit address tag allows the complete 32-bit address to be formed. These fields are depicted graphically in FIG. 1B.
Many computer systems currently allow the use of more memory than is physically available through the use of virtual memory. At its essence, virtual memory allows individual program processes to run in an address space that is larger than is physically available. The process simply addresses memory as if it were the only process running. This is a virtual address space unconstrained by the size of main memory or the presence of other processes. The process can access virtual memory starting at 0x0, regardless of what region of physical memory is actually allocated to the process. A combination of operating system software and physical hardware translates between virtual addresses and the physical domain. If more virtual address space is in use than exists in physical main memory, the operating system will have to manage which virtual address ranges are stored in memory, and which are located on a secondary storage medium such as magnetic disk.
Level one (L1) caches are often located on the same die as the processing core. This allows very fast communication between the two and permits the L1 cache to run at processor speed. A level two (L2) cache located off chip requires more than a single processor cycle to access data and is referred to as a multi-cycle cache. Main memory is even slower, requiring tens of cycles to access. In order for the L1 cache to operate at processor speed, the L1 cache typically uses the virtual addressing scheme of the processor. This avoids the overhead of virtual-physical translation in this critical path.
While the L1 cache is examined to determine if it contains the requested address, the virtual address is translated to a physical address. If the L1 cache does not contain the requested address, the L2 cache is consulted using the translated physical address. The L2 cache can then communicate with the bus and main memory using physical addresses.
Cache coherence is a key concern when using caches. Operations such as direct memory access (DMA) request direct access to the main memory from the processor. Data that has been cached in the L1 or L2 caches may have been changed by the processor since being read from main memory. The data subsequently read from main memory by the DMA device would therefore be outdated. This is the essence of the problem of cache coherence.
One technique for enforcing cache coherence is to implement a write-through architecture. Any change made to cached data is immediately propagated to any lower level caches and also to main memory. The disadvantage to this approach is that writing through to memory uses precious time, and may be unnecessary if further changes are going to be made prior to data being needed in main memory.
Most current cache configurations instead use write-back mode. In write-back mode, a change made to the contents of a cache is not propagated through to memory until specifically instructed to. A piece of data that has been changed in a cache but not yet propagated through to the next level of cache or to main memory is referred to as dirty. The cache location can be “cleaned” by directing the cache to be written back to the cache below or to the memory thereby making the piece of data clean, or coherent. This may happen at regular intervals, when the memory is available, or when the processor determines that a certain location of memory will need the updated value.
An analogous cache coherency problem occurs when data has been changed in main memory, and one of the caches now contains an outdated copy. The applicable lines in the cache thus need to be “flushed,” or invalidated. Flushing a cache line involves marking it as invalid. When the processor next requests that data from the cache, the cache misses and must retrieve the updated data from main memory.
In order to facilitate maintaining cache coherence, some processors have a register set that allows the processor to issue cache coherency commands. If the processor believes that a piece of data is needed in main memory, the processor can issue a clean command. The clean command may be targeted to a specific cache line, or to some portion of the entire cache. The processor can also issue flush commands. This tells the cache to invalidate or flush a certain region memory, a certain cache line, or the entire cache.