1. Field of the Invention
The present invention relates to processors having multi-level memory systems, and specifically to aliasing in the presence of virtually-indexed caches.
2. Discussion of the Prior Art
Using multi-level memory systems is a general technique for exploiting locality of reference. The basic idea is to organize a small amount of fast access memory and a large amount of slower access memory so that most of the accesses go to the small, fast memory. The average access time of such a memory system may be only slightly greater than that of the small, fast memory, while its effective size is that of the large memory.
A common form of multi-level memory system is cache memory, or lookaside buffer memory. As illustrated in FIG. 1, a cache memory 10 is a relatively small, specialized memory device placed between the processor 11 and main memory 12. The cache memory 10 holds copies of words from main memory 12 that are likely to be accessed by the processor. The cache 10 is faster than main memory 12; thus if frequently accessed locations are found in the cache 10, the cache hit rate will be high and the average memory access time will be small. The strategy followed by the cache 10 is to hold words located near other words recently used by the processor. The locality of reference exploited by this strategy is the propensity of memory accesses, over short periods of time, to cluster in small regions of memory.
Cache memory locations are redundant, in the sense that each is used to provide a more accessible copy of information also stored in slower main memory 12. Thus the total addressable memory size, as seen by the programmer, is not increased by the presence of a cache 10. Rather, the cache 10 provides, in a program transparent way, an improvement in the average access time to locations in the same address space.
As the cache 10 is much smaller than main memory 12, only a minority of the main memory locations can be cached at any one time. Consequently, in the general case each location in the cache 10, or cache line, conceptually has two parts--a tag field and a contents field. When a read or write operation is requested, the desired address is compared with the tag field of certain lines of the cache. If a match is found, the contents field of the cache line containing the matching tag is read or written. This is known as a cache hit, and there is no need to access main memory.
Modern processors support virtual address space, which is conceptually distinct from physical address space. A virtual address is a label that the processor uses to specify a memory location. The processor is not concerned with where that memory location actually resides in the physical memory, so long as the processor is able to access the location with the virtual address. A processor architecture specification defines a certain virtual address space which must be supported. The operating system that manages the computer system has flexibility as to how that virtual address space is mapped to physical memory. Thus, there is a translation that must occur from virtual to physical address.
FIG. 2 illustrates a typical virtual to physical translation. Both virtual and physical memory is conceptually separated into pages. The virtual address 20 of a memory location, for instance, consists of a virtual page number 21 and a page offset index 22. The virtual page number 21 indicates on which page of memory the specified memory location resides. The page offset index 22 represents where the desired memory location is within the boundaries of its page. Thus, the page offset index 22 dictates the location relative to the beginning of the page. Physical addresses are also constructed in the same fashion. A physical page number 23 and a page offset index 24 define a physical memory address 25. By convention, most implementations of a virtual address space happen to index memory within each page in the same manner for the virtual and physical addresses. Thus, the page offset indexes 22 and 24 are the same for the virtual address 20 and corresponding physical address 25 of a memory location. Therefore, if the page size is 2.sup.p, the low order p bits of the virtual and corresponding physical address of a memory location are equal and represent the page offset index.
In the process of translating from a virtual address 20 to a physical address 23, since the virtual page offset 22 is the same as the physical page offset 24, the page offset portion of the virtual address does not need translation. However, the physical page number 23 is not the same as the virtual page number 21, and thus the virtual page number 21 must be translated. A memory management unit 26, which typically is implemented in a combination of hardware (translation lookaside buffers) and software (table walking), performs this virtual to physical page translation.
FIG. 3 illustrates the operation of a non-associative or direct mapped cache. The direct mapped cache uses the low-order bits 30 of the incoming memory address 31 to dictate the address within the cache to examine for a hit. Thus, a memory location A can only reside in the cache line whose address within the cache is represented by the low k bits 30 of the address of A. While constraining a memory address 31 to be cached in only one specific line within the cache is a serious limitation, it allows the use of cheaper and denser standard Random Access Memory (RAM) rather than associative memory. The constraint implies, however, that memory locations which share the same k low-order address bits 30 also share the same cache line. If each cache line only provides for the storage of the contents of one memory location, then two memory locations with the same low order k address bits 30 cannot be cached simultaneously, since they contend for the same cache line. Another important and attractive characteristic of a direct-mapped cache is that it tends to operate faster given that it is simpler and require less circuitry.
Most processor architecture specifications support aliases among the virtual addresses. Two virtual addresses which refer to the same physical address are aliases of each other. The low order p bits of the virtual address of any given memory location represent the page offset from the beginning of the virtual page. The page offset index for any given virtual address is equal to the page offset index of the physical address. Therefore, if two addresses are aliases of each other, their low order p bits must be equal, since they both refer to the same physical address. However, the bits which represent the virtual page numbers of the two aliases are different.
For the purposes of this discussion, the possible cache organizations are: virtually or physically indexed, virtually or physically tagged, and direct-mapped or set-associative. In a processor which supports virtual addresses, some of the caches in that processor may be virtually indexed direct mapped caches, while other caches are physically indexed direct mapped caches. A virtually indexed direct mapped cache uses the virtual address to provide the cache line mapping, while a physically indexed direct mapped cache uses the physical address to provide the cache line mapping. Thus, in a virtually indexed cache, if the number of cache lines is 2.sup.k, then the low order k bits of the virtual address are used to map to the cache line for that virtual address.
If the number of cache locations is equal to or smaller than the number of locations in a page, there is no distinction between a virtually indexed and a physically indexed cache, since the low order p bits of the physical address and its corresponding virtual address are the same. In that case, when k is less than or equal to p, all aliases will map to the same cache line, since the low order k bits used to map to the cache line are all equal. However, if the cache size is larger than 2.sup.p, the virtually indexed caches and physically indexed caches will produce different cache line mappings for the same memory location, since some of the virtual page bits or physical page bits are used to derive the cache line. Moreover, in a virtually indexed cache, aliases may map to different cache lines within the same cache. In a virtually indexed cache when the page size is less than the cache size, one or more of the virtual page bits are used to provide the cache line mapping. Since the virtual pages of the aliases are different, the virtual page bits of the virtual addresses are different. Thus, the cache lines may be different, since one or more of the low order virtual page number bits must be used to map into a cache line.
For example, if a virtually indexed cache has 2.sup.14 lines, and the physical page size is only 2.sup.12, then any given physical memory location could have as many as four aliases which all mapped to different cache lines. This is a consequence of the fact that two of the virtual page number bits must be used to provide the mapping into the cache. Since there are four possible values which could be represented by the two bits, the aliases could map to any of four different locations depending upon their low order two virtual page number bits.
When two or more virtual addresses map to the same physical address, data written to one of those virtual addresses should be visible upon subsequently reading from any of the aliases. If all the aliases map to the same cache lines, virtually indexed caches behave as expected. Data written to any one of the aliases will be written to the same cache line. However, significant data inconsistency problems occur if the aliases map to the different cache lines. If two virtual aliases map to different cache lines of a virtually addressed cache, each alias behaves as if it were its own independent variable. For example, if X and Y are virtual address aliases, and if a store to X is performed, that data would not be visible during a subsequent read of Y if Y were also resident in the cache. Furthermore, the value in Y remains unchanged by the write to X, whereas it should have been destroyed. When X and Y are removed from the cache, each will be written into the physical memory location to which they both refer. Assuming that the cache was a write back cache (rather than write through), the value which exists in the physical main memory after both X and Y are removed from the cache will depend upon which virtual address was removed from the cache last. Thus, if Y is removed last, the data written to X is destroyed while the stale data at Y incorrectly occupies the physical location in main memory.
In order to maintain data consistency, one prior art approach required that all of the virtual aliases be made non-cacheable; not even one of the aliases could remain in a cache. A serious performance degradation results from this prior art approach, since any references to any of the aliases cannot derive any benefit from the first-level or second-level caches, and must access main memory, which is the slowest level of memory.
FIG. 4 shows the structure of the cache and address translation mechanisms. The operating system maintains translation information in an arbitrary data structure, called the software translation table 40. Translation lookaside buffers provide quick hardware translations, and are essentially caches of the large and complex software translation table. For each cache, a translation lookaside buffer (TLB) 41 exists which acts as an independent cache of the software translation table 40. For more frequently accessed virtual pages, the TLB provides a one-cycle translation. The term "TLB hit" means that the desired translation is present in the on-chip TLB. The term "TLB miss" means that the desired translation is not present in the on-chip TLB.
On a TLB miss the memory management unit immediately traps to software for TLB miss processing. The TLB miss handler software routine has the option of filling the TLB by any means available. Some memory management units include an intermediate translation device, such as the translation storage buffer 42 shown in FIG. 4. The translation storage buffer acts like a second-level cache of address translations. While the TLBs are small and fast, the software translation table is likely to be large and complex. Thus, when the TLB miss handler operating system software routine is invoked after a TLB miss, a significant number of cycles may occur before the virtual to physical translation can be retrieved and processing can continue.
A second prior art approach involves allowing one virtual alias to stay in the caches, but requiring the others to be excluded from caches and from the TLB. Allowing one alias to remain cacheable offers some benefits over the previous approach of making all aliases non-cacheable. If one alias remains cacheable, then repeated accesses to that alias still derive benefit from the quickness of the cache and TLB entry for that address.
For example, if X and Y are aliases, X can remain in the cache and its translation can remain in the TLB. However, Y must be made non-cacheable. If X is referenced repeatedly without any references to Y, then some benefit of the cache is retained. However, when Y is referenced, a TLB miss will occur. To insure correct operation, before putting Y into the cache or including Y's translation in the TLB, the operating system must remove X from the cache and remove X's translation from the TLB. If X's translation were not removed from the TLB, no TLB miss would occur the next time a reference to X occurred. When X was again referenced, without software intervention, it would again be entered in the cache, unacceptably coexistent with Y.
The concept of cacheable address attributes is known in the prior art. A cacheable address attribute is a logical designation associated with a portion of memory which indicates whether data from that portion of memory can or cannot be placed into a cache. Cacheable address attributes are typically used to disallow the caching of memory mapped input or output locations. Most processor architectures support interfaces with input/output devices as being mapped to a certain part of the physical address space. For example, the mouse in a computer system may be mapped to a certain memory location. To keep up with what is happening with the mouse, the processor will periodically read from the address to which the mouse is mapped as if it were just another memory location. Similarly, when the processor wants to write data to a disk drive, for example, it writes to a special memory location dedicated to that output device. Caching those input/output locations is not permitted. A write to the output device is intended to reach the output terminals, not merely an internal cache which is only accessible to the processor. Similarly, a read from an input device is intended to read from the input device itself, not merely a cached version of what previously had been existent on the terminals of the input device.
An important problem with the approach of making aliases which violate the alias restriction non-cacheable is the possibility of having to deal with non-cacheable accesses to main memory. These accesses complicate many specialized instructions, such as "compare and swap" (CAS) and "partial store" (PST). It is very difficult to make these instructions work properly if they have to access non-cacheable main memory. Hence, there is a great benefit in developing a strategy where main memory can always be cacheable at least in physically indexed caches. The performance benefits which result from such a strategy, are additional added incentives.