This invention relates in general to the field of microprocessor caches, and more particularly to virtual set caches.
Modern microprocessors include data caches for caching within the microprocessor the most recently accessed data to avoid having to load the data from physical memory or to store the data to physical memory, since accessing physical memory takes an order of magnitude longer than accessing the data cache. For efficiency reasons, data caches do not cache data on a byte granularity basis. Instead, data caches typically cache data on a cache line granularity basis. A common cache line size is 32 bytes.
Data caches are smaller than physical memory. Consequently, when a data cache caches a line of data, it must also save the address of the data in order to later determine whether it has the data when a new instruction executes that accesses data in memory. The saved address of the data is referred to as a tag. When the new instruction accesses a memory address, the data cache compares the new memory address with the addresses, or tags, it has stored to see if a match occurs. If the new address matches one of the tags, then the data is in the cache, and the cache provides the data to the requesting portion of the microprocessor, rather than the microprocessor fetching the data from memory. The condition where the data is in the data cache is commonly referred to as a cache hit.
Data caches store hundreds of tags for hundreds of cache lines cached in the data cache. Comparing a new address with the hundreds of tags stored in the cache would take too long and make the cache too slow. Therefore, caches are arranged as arrays of sets. Each set includes a cache line, or more often, multiple cache lines. A common arrangement for caches is to have four cache lines in a set. Each of the four cache lines is said to be in a different cache way. A cache with four cache lines in a set is commonly referred to as a four-way set associative cache. Typically, when a new cache line is to be stored into a set, the least recently used cache line of the set is chosen for replacement by the new cache line.
By arranging the cache as an array of sets, the time required to compare the new address with the addresses stored in the cache is reduced to an acceptable amount as follows. When a cache line is stored into the cache, the cache does not allow the cache line to be stored into any arbitrary one of the sets in the array. Instead, the set into which the cache line may be stored is limited based on the address of the cache line. The lower order bits of the new address are used to select only one of the sets in the array. The address bits used to select one of the sets from the array are referred to as the index. Since the cache is smaller than the physical memory, only the lower order bits are needed for the index. That is, since the number of cache lines stored in the cache is much smaller than the number of cache lines stored in memory, a fewer number of address bits are needed to index the cache than to index physical memory. Once a set in the cache is selected by the index, the cache need only compare the tags of the cache lines in the selected set with the new address to determine whether a cache hit has occurred.
The number of address bits needed for the index depends upon the number of sets in the array. For example, if the cache has 512 sets, then nine address bits are needed to index the array of sets. Which of the address bits is used for the index depends upon the size of a cache line. For example, if the cache line size is 32 bytes, the lower 5 bits of the address are not used, since those bits are only used to select a byte within the cache line. Hence, for a cache with 512 sets of 32-byte cache lines, address bits 13:5 may used as the index.
Modern microprocessors also support the notion of virtual memory. In a virtual memory system, program instructions access data using virtual addresses. The virtual addresses are rarely the same as the physical address of the data, i.e., the address of the location in physical memory where the data is stored. The physical address is used on the processor bus to access physical memory. Furthermore, the data specified by the virtual memory address may not even be present in physical memory at the time the program instruction accesses the data. Instead, the data may be present in secondary storage, typically on a disk drive.
The operating system manages the swapping of the data between disk storage and physical memory as necessary to execute program instructions. The operating system also manages the assignment of virtual addresses to physical addresses, and maintains translation tables used by the microprocessor to translate virtual addresses into physical addresses. Modern microprocessors employ a translation lookaside buffer (TLB), which caches the physical address translations of the most recently accessed virtual address to avoid having to access the translation tables to perform the translations.
Typical virtual memory systems are paging memory systems. In a paging memory system, physical memory is divided into pages, typically of 4 KB each. Consequently, only the upper bits of the virtual address need be translated to the physical address, and the lower bits of the virtual address are untranslated. That is, the lower bits are the same as the physical address bits, and serve as a physical byte offset from the base address of the physical page. The base address of the physical page is translated from the upper bits of the virtual address. For example, in a paging system with 4 KB pages, the lower 12 bits of the virtual address, i.e., bits 11:0, are untranslated, and are physical address bits. Accordingly, if the virtual address is 32 bits, the upper 20 bits of the virtual address, i.e., bits 31:12, are translated based on the translation tables, and are cached in the TLB.
One side effect of a virtual memory system is that two different programs may access the same physical location in memory using two different virtual addresses. Consequently, caches insure data coherency by using the physical address to keep track of the cached data. That is, the tags are physical addresses. Additionally, physical addresses should be used for the index. However, using physical addresses for the index may be detrimental to performance for the reason now described.
The desire for larger caches continues, and the increase in integration densities of microprocessor integrated circuits has enabled modern microprocessors to employ relatively large caches. Borrowing from the examples above, assume a 64 KB four-way set associative cache with 32-byte cache lines in a paging system with 4 KB pages. Each set comprises 128 bytes of data in the four cache lines of the set. This results in 512 sets in the array. As was seen from the example above, the index would be address bits 13:5. However, we also observe that address bits 13:12 are translated address bits, i.e., virtual address bits, not physical address bits.
One solution is to wait for the TLB to translate virtual address bits 13:12 and use the translated physical address bits 13:12 as the upper two bits of the index. However, this solution has the performance disadvantage that it now takes longer to index the cache to obtain or store data since we must wait for the TLB to perform its translation in order to use physical address bits 13:12 to index the cache. Potential consequences are that either the cycle time of the microprocessor must be increased, or another stage must be added to the microprocessor to accommodate the additional TLB lookup time to avoid lengthening the cycle time.
To avoid the performance penalty associated with waiting for the TLB to provide the translated physical address bits needed for the index, the microprocessor may use some of the virtual address bits in the index, such as virtual address bits 13:12 in the example above. A cache that uses some virtual address bits for its index is referred to as a virtual set cache. The cache is a virtual set cache because it is no longer deterministic as to which set in the cache array a given cache line may be stored in. Rather, the cache line may be stored in one of multiple sets since the virtual address bits used in the index may have multiple values to refer to the same physical cache line. The multiple sets that the cache line may be stored in are referred to as virtual sets. Using the example cache above, a cache line having physical address bits 13:12 with a value of 01 could be accessed with four different virtual addresses. That is, not only could the cache line be accessed with virtual address bit values 13:12 of 01, but also with values of 00, 10, and 11. Hence, the cache line could be stored in any one of four different virtual sets in the cache. The set selected by the physical address bits is referred to as the physical set.
A negative consequence of this aspect of virtual set caches is that they may incur what is referred to as a virtual set miss. A virtual set miss occurs when an instruction accesses data that is present in the cache, but because part of the index is virtual, the index selects one of the virtual sets other than the virtual set in which the data containing the cache line resides, i.e., other than the physical set. A virtual set miss generated by a store operation is a virtual set store miss.
The present inventors have examined code traces and observed that the Windows 98(copyright) operating system frequently executes two instructions within approximately 200 instructions of one another that store to the same physical memory address using two different virtual addresses. These instructions represent a situation in which a virtual set store miss would occur in a virtual set cache. Therefore, what is needed is a virtual set cache that does not incur a virtual set store miss penalty.
The present invention provides a virtual set cache that avoids incurring a virtual set store miss penalty. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide a virtual set cache. The virtual set cache includes an array of 2**N sets of cache lines and associated tags. The virtual set cache also includes an index, coupled to the array, which includes M untranslated physical address bits of a store address for selecting 2**(N-M) virtual sets of the 2**N sets. The virtual set cache also includes a plurality of comparators, coupled to the array, which compare the associated tags in the 2**(N-M) virtual sets with a plurality of translated physical address bits of the store address. The virtual set cache also includes a hit signal, coupled to the comparators, which indicates a hit in the virtual set cache if one of the associated tags in the 2**(N-M) virtual sets matches the plurality of translated physical address bits of the store address.
In another aspect, it is a feature of the present invention to provide a microprocessor. The microprocessor includes a translation lookaside buffer (TLB), that receives a virtual page number of a store operation and provides a physical page address of the virtual page number. The microprocessor also includes a virtual set cache, coupled to the TLB, that receives a physical cache line offset of the store operation. The physical cache line offset selects a plurality of virtual sets comprised in the virtual set cache. The virtual set cache queries which one of the plurality of virtual sets contains a cache line specified by the store operation based on the physical page address provided by the TLB. The microprocessor also includes an address register, coupled to the virtual set cache, which stores a matching virtual set number that specifies the one of the plurality of virtual sets specified by the store operation. The microprocessor updates the cache line based on the matching virtual set number stored in the address register, if the virtual set cache indicates a portion of the virtual page number used to index the virtual set cache would have generated a virtual set store miss.
In another aspect, it is a feature of the present invention to provide a method for storing data specified by an address of a store instruction into a virtual set cache. The method includes indexing into the cache using untranslated physical address bits of the address of the store instruction to select a plurality of virtual sets of cache lines and associated tags. The method also includes translating virtual address bits of the address of the store instruction into translated physical address bits, and comparing the associated tags with the translated physical address bits. The method also includes saving a matching virtual set number of a matching one of the plurality of virtual sets based on the comparing. The method also includes indexing into the cache using the matching virtual set number to update the matching one of the plurality of virtual sets, if the matching one of the plurality of virtual sets is not a same virtual set as specified by the virtual address bits of the store instruction.
An advantage of the present invention is that it avoids the virtual set store miss penalty without requiring a large amount of additional logic.
Other features and advantages of the present invention will become apparent upon study of the remaining portions of the specification and drawings.