The present invention concerns a method for handling a cache miss in a multi-level cache system.
Most modern computer systems include a central processing unit (CPU) and a main memory. The speed at which the CPU can decode and execute instructions and operands depends upon the rate at which the instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the time required for the CPU to obtain instructions and operands from main memory, many computer systems include a cache memory between the CPU and main memory.
A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions of the contents of main memory which it is believed will be used in the near future by the CPU. The main purpose of a cache is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. The information located in cache memory may be accessed in much less time than information located in main memory. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions and operands to be fetched and/or stored.
A cache memory is made up of many blocks (also called lines) of one or more words of data. Each block has associated with it an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor makes a memory reference, an address tag comparison is made to see if a copy of the requested data resides in the cache. If the desired memory block is not in the cache, the block is retrieved from the main memory, stored in the cache and supplied to the processor. When a new cacheline X is to be brought into a cache, if there is already another cacheline Y in the target cache entry, that resident cacheline Y becomes a "victim cacheline." To make room for the new cacheline X, the victim cacheline Y must be removed from the cache. If the victim cacheline Y has been modified since it was brought into the cache it is called "dirty" and it must be written back to main memory; if it has not been modified it is called "clean" and it may simply be discarded. In a two-level cache system, there can be "level one victim cache lines" and "level two victim cache lines."
In addition to using a cache to retrieve data from main memory, the CPU may also write data into the cache instead of directly to the main memory. When the processor desires to write data to the memory, the cache makes an address tag comparison to see if the data block into which data is to be written resides in the cache. If the data block exists in the cache, the data is written into the data block in the cache and a data "dirty bit" for the data block is set. The dirty bit indicates that data in the data block has been modified, and thus before the data block is deleted from the cache the modified data must be written back into main memory. If the data block into which data is to be written does not exist in the cache, the data block must be fetched into the cache or the data written directly into the main memory.
In some applications two cache memories are used. A first level cache memory typically has a subset of the data in a second level cache memory. Similarly, the second level cache memory typically has a subset of the data in the main memory. Generally, the first level cache is small in size relative to the second level cache. The first level cache has a fast access time which is typically one processor cycle. The second level cache has a somewhat slower access time of, for example, two to three cycles. The second level cacheline size is an optimized tradeoff between memory access overhead (which improves with larger line sizes) and trying to avoid wasting cycles by fetching data that is never used. Typically, the second level cacheline size is two to sixteen words. The first level cacheline size typically ranges from one word to the size of the cacheline for the second level cache.
Accessed data is first searched for in the first level cache memory. If there is a miss in the first level cache memory, the accessed data is searched for in the second level cache memory. If there is a miss in the second level cache memory, the data is fetched from the main memory.
In one arrangement of a two level cache for a processor system, the first level cache is a full subset of the second level cache. It is also smaller and closer to the processor than a second level cache. Because it is smaller and closer to the processor, a first level cache can, in general, offer improved performance because it has a smaller access latency than its companion second level cache or that of memory.
In normal operation there are several different actions that may be necessary to satisfy some memory reference. A memory reference is generally a load or store instruction. First, simplest, and fastest, the reference might hit in the first level cache. A hit is defined as when the data for a desired memory address is present in the cache being checked (first level or second level). A miss is defined to be a memory reference where the data for a desired memory address is not present in the cache being checked (first level or second level). When there is a hit in the first level cache, there is a zero cycle penalty, and the reference is completed without a processing penalty.
The next fastest case is a first level cache miss that happens to hit in the second level cache. This causes a sequence of operations to be performed to fill a single first level cache line with data from the second level cache. Data is subsequently supplied to the processor or the store completes. This is a medium speed operation, and the processor will be frozen while it waits for the memory reference to be satisfied.
The slowest operation is when the memory reference misses both the first level and second level caches. In this instance, a long sequence of operations is initiated to bring the relevant line from main memory into the second level cache. When this data is returned from memory and copied into the second level cache, the first level cache is again referenced, resulting, this time, in a first level cache miss that now hits in the second level cache. This causes the relevant portion of the second level cache line to be written into the first level cache, and subsequently the requested data is supplied to the processor, or the store completes.
One extension is to fill the addressed first level cache line at the same time as the second level cache line is being written. This avoids the penalty of the first level cache miss and only has the performance penalty of the second level cache miss. For general information on multilevel cache systems, see for example: J. L. Baer, W. H. Wang, Multilevel Cache Hierarchies-Organizations, Protocols and Performance, Journal of Parallel Distributed Computing, Vol. 6, 1989, pp. 451-476; W. H. Wang, J. L. Baer and H. Levy, Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy, Proceedings of the 16th Annual International Symposium on Computer Architecture, 1989, pp. 140-148; and J. L. Baer, W. H. Wang, On the Inclusion Properties for Multi-Level Cache Hierarchies, Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988, pp. 73-80.