1. Field of the Invention
The present invention relates to data processing systems utilizing cache memories, and more particularly to multi-level cache memories.
2. Art Background
Caches are used in various forms to reduce the effective time required by a processor to access instructions or data that are stored in main memory. The theory of a cache is that a system attains a higher speed by using a small portion of very fast memory as a cache along with a larger amount of slower main memory. The cache memory is usually placed operationally between the data processing unit or units and the main memory. When the processor needs to access main memory, it looks first to the cache memory to see if the information required is available in the cache. When data and/or instructions are first called from main memory, the information is stored in cache as part of a block of information (known as a cache line) that is taken from consecutive locations of main memory. During subsequent memory accesses to the same addresses, the processor interacts with the fast cache memory rather than main memory. Statistically, when information is accessed from a particular block in main memory, subsequent accesses most likely will call for information from within the same block. This locality of reference property results in a substantial decrease in average memory access time.
FIG. 1 is a simplified block diagram of a cache 100. The cache includes a set of cache lines 102. Each cache line 102 is capable of storing a block of data 104 from consecutive addresses in main memory. Each cache line 102 is associated with a tag 106, which represents the block address of the line. A valid bit 108 indicates that the cache line contains valid data. A dirty bit 110 indicates cache coherency, i.e., whether or not the data in the cache accurately reflects the data maintained in the same address in main memory or other memory units. The reading and writing of data in the cache is controlled by a cache access logic circuit 112.
The use of cache memory in the context of a computer system is illustrated in FIG. 2. A first CPU1 200 interacts with a cache L1 202 over an internal CPU bus 204. The cache L1 202 interacts with main memory 206 over a system bus 208. A second processor CPU2 210 also may access memory 206 over the system bus 208.
When the CPU1 200 attempts to access the main memory 206, the address issued by the CPU1 200 is presented to the cache access logic in the cache L1 202. The cache access logic compares the relevant part (the tag field) of the physical address containing the block address to addresses it currently stores in the tag array of the cache 202. If there is a match, i.e., a cache hit, then the data found at the referenced address is returned to the processor 200 if the memory access is a read operation. If the processor 200 is attempting to write to memory, then the processor 200 writes the data to the cache line that resulted in a cache hit. If, however, the address fails to match any of the tag addresses, i.e., a cache miss occurs, then the cache access logic of the cache 202 causes the main memory data block containing the information at the addressed location to be copied into the cache 202. In conjunction with this operation, the cache access logic also sets the corresponding valid bit to indicate that that cache line has been allocated.
Computer systems implement a number of policies to update main memory when a write operation from the processor 200 changes the contents of the cache 202. Under a write through policy, when the processor 200 writes to the cache 202, the corresponding data is also updated in the main memory 206. Under a write back policy, the data in main memory is updated only when the cache line containing the modified data is forced out of the cache 202 or when an external processor such as processor 210 needs to access the data. A cache line may be forced out of the cache 202, for example, if it is the least recently used (LRU) cache line. By its very nature, the write back policy results in less traffic on the system bus 208 between the cache 202 and the memory 206.
Under the write back policy, the processor 200 updates data in cache 202 without immediately updating data found at the same address in memory 206. This results in inconsistency or incoherency between the cache 202 and main memory 206. When this happens, the cache access logic sets the dirty bit in the corresponding cache line. The cache 202 "snoops" the address lines from the system bus 208 to determine whether any device is attempting to access data found in the cache 202. If so, the cache access logic causes the cache 202 to write the data into main memory 206 so that it is available to the second processor 210. When doing so, the cache access logic also resets the dirty bit of the corresponding cache line to indicate that the data in the cache 202 is now consistent with that in memory 206.
Caches may be arranged in a multi-level configuration as shown in FIG. 3. Here, a second level cache L2 203 is interposed between the first level cache L1 202 and the system bus 208. The L2 cache 203 is slower but larger than the L1 cache 202. Typically, the L2 cache 203 is used under an inclusion policy which dictates that all the contents of the L1 cache 202 are maintained in the L2 cache 203. In general, in a multi-level cache system containing more than two caches, all the contents of the level N cache are stored in the level N+1 cache.
The advantage of duplicating information in the L2 cache is that the L2 cache 203 can maintain multiprocessor coherency without involving the first level cache L1 202. For example, L2 cache 203 performs the snooping of memory access requests over system bus 208, which frees the first level cache 202 to be dedicated to interactions with the first processor 200. If the second processor 210 performs a write to memory, the L2 cache 203 may detect a hit and update its corresponding cache line or invalidate the corresponding valid bit. Further, the L2 cache 203 may pass the address on to the L1 cache 202. If there is a hit in the L1 cache 202, then the corresponding cache line can be invalidated or updated with an invalid signal or data, respectively, passed on by the L2 cache 203. Some systems always update data and some always invalidate entries according to the particular memory management policy used by the system. For a read operation by the second processor 210, when it attempts to access a memory location that has been cached (in the dirty state) in the second level cache 203, the L2 cache 203 provides the data to main memory 206. From the above description, it can be seen that the interposition of the second level cache 203 provides some isolation of the L1 cache 202 from the traffic on the system bus 203.
The size of a level N+1 cache is typically much greater than that of a level N cache. In that case, the cost of storing the contents of a level N cache twice is minimal. On the other hand, the overhead becomes unacceptably high if the next level cache is only two to four times larger. Perhaps the most important situation in which this may occur is when the second level cache is designed to be placed on the same chip die as the first level cache. By doing so, the width of the data path between the two caches can be drastically increased to a data path on the order of 128-256 lines. However, a disadvantage of two-level caching under the inclusion policy is that if the ratio in size between first level and second level caches is small, much of the second level cache will consist of data that is already in the primary cache. Then, most misses in the primary cache will also miss in the second level cache. In this situation adding a second level cache can get in the way by adding delay between the first level cache miss and an off-chip access more than it helps by reducing the off-chip miss rate.
It is therefore desired to increase the efficiency of a multi-level caching configuration, especially in the case when the sizes of caches on adjacent levels are relatively similar.