The present invention concerns a method for decreasing time penalty resulting from a cache miss in a multi-level cache system.
Most modern computer systems include a central processing unit (CPU) and a main memory. The speed at which the CPU can decode and execute instructions and operands depends upon the rate at which the instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the time required for the CPU to obtain instructions and operands from main memory many computer systems include a cache memory between the CPU and main memory.
A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions of the contents of main memory which it is believed will be used in the near future by the CPU. The main purpose of a cache memory is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. The information located in cache memory may be accessed in much less time than information located in main memory. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions and operands to be fetched and/or stored.
A cache memory is made up of many blocks of one or more words of data. Each block has associated with it an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor makes a memory reference, an address tag comparison is made to see if a copy of the requested data resides in the cache memory. If the desired memory block is not in the cache memory, the block is retrieved from the main memory, stored in the cache memory and supplied to the processor.
In addition to using a cache memory to retrieve data from main memory, the CPU may also write data into the cache memory instead of directly to the main memory. When the processor desires to write data to the memory, the cache memory makes an address tag comparison to see if the data block into which data is to be written resides in the cache memory. If the data block exists in the cache memory, the data is written into the data block in the cache memory. In many systems a data "dirty bit" for the data block is then set. The dirty bit indicates that data in the data block is dirty (i.e., has been modified), and thus before the data block is deleted from the cache memory the modified data must be written into main memory. If the data block into which data is to be written does not exist in the cache memory, the data block must be fetched into the cache memory or the data written directly into the main memory. A data block which is overwritten or copied out of cache memory when new data is placed in the cache memory is called a victim block or a victim line.
In some applications a second cache memory is added in series between the first cache memory and the main memory. The first cache memory typically has a subset of the data in the second cache memory. Similarly, the second cache memory typically has a subset of the data in the main memory. Accessed data is first searched for in the first cache memory. If there is a miss in the first cache memory, the accessed data is searched for in the second cache memory. If there is a miss in the second cache memory, the data is fetched from the main memory.
In one arrangement of a two level cache for a processor system, the first level cache is a proper subset of the second level cache. What is meant by a proper subset is that all entries in the first level cache are also in the second level cache. The second level cache has additional entries that are not in the first level cache. It is also smaller and closer to the processor than a second level cache. Because it is smaller and closer to the processor, a first level cache can, in general, offer improved performance because it has a smaller access latency than its companion second level cache or even that of memory.
In normal operation there are several different actions that may be necessary to satisfy some memory reference. A memory reference is generally a load or store instruction. First, simplest, and fastest, the reference might hit in the first level cache. A hit is defined as when the data for a desired memory address is present in the cache being checked (first level or second level). A miss is defined to be a memory reference where the data for a desired memory address is not present in the cache being checked (first level or second level). When there is a hit in the first level cache, there is a zero cycle penalty, and the reference is completed without a processing penalty.
The next fastest case is a first level cache miss that happens to hit in the second level cache. This causes a sequence of operations to be performed to fill a single first level cache line with the appropriate sixteen byte quantity from the second level cache. Data is subsequently supplied to the processor or the store completes. This is a medium speed operation, and the processor will be frozen while it waits for the memory reference to be satisfied.
The slowest operation is when the memory reference misses both the first level and second level caches. In this instance, a long sequence of operations is initiated to bring the relevant line from main memory into the second level cache. When this data is returned from memory and copied into the second level cache, the first level cache is again referenced, resulting, this time, in a first level cache miss that now hits in the second level cache. This causes the relevant portion of the second level cache line to be written into the first level cache, and subsequently the requested data is supplied to the processor, or the store completes.
Typically, cache memories are direct mapped. That is, an index is used to access one or more entries in the cache. The tag for the entry is then compared with the tag portion of the address to determine whether a match has occurred.
In a multi-way set-associative cache, a single index is used to simultaneously access a plurality of data random access memories (RAMs). A data RAM may be implemented by one or more physical random access memory integrated circuits. A set is a collection of all lines addressed by a single cache index. The number of data RAMs addressed by a single cache index indicates the way number of a cache. For example, if in a cache a single cache index is used to access data from two data RAMs, the cache is a two-way set-associative cache. Similarly, if in a cache a single cache index is used to access data from four data RAMs, the cache is a four-way set-associative cache.
When a multi-way access is made, a tag comparison is made for each data RAM. If a tag comparison indicates the desired data block is in a particular data RAM the operation is performed on/with data from that particular data RAM.
In a fully associative cache, no index is used. When an access is made to a fully associative cache, a tag comparison is made for each cache line within the fully associative cache. If a tag comparison indicates the desired data line is in the cache, the operation is performed on/with data from that particular data line.
For a general discussion of cache systems, see for example, David A. Patterson, John L. Hennessy, Computer Architecture A Quantitative Approach, Morgan Kauffman Publishers, Inc., San Mateo, Calif., 1990, pp. 404 through 423 and 454 through 464.
In one prior art system, a fully associative victim cache receives victim lines from a larger direct mapped cache. When there is a miss in the direct mapped cache, the fully associative victim cache may sometimes be found in the fully associative victim cache. See, Norman P. Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, Proceedings 17th ISCA, May 1990, pp. 364-373, Seattle, Wash.