The present invention relates to computer architectures and in particular to an improved computer architecture providing for a memory cache that allows access of cache contents by virtual addresses rather than physical addresses.
Cache memories are used to minimize the time required for a processor to access memory data by providing relatively compact, quickly accessed memory structure close to the processor. Portions of the main memory are loaded into the cache memory with the expectation that temporally proximate memory accesses will tend to cluster in the loaded portion (locality of reference) thus allowing the cache memory, once loaded, to serve multiple memory accesses by the processor before needing to be reloaded. Often multiple levels of cache (e.g., L1, L2, L3) may be used to optimize the trade-offs between rapid access and limited storage inherent in the cache structure.
To access a data in a cache memory, a set of bits from a first set of address bits for the data, generally called the index bits, is used to index into the cache and select a line at an indexed entry. A set of bits from a second set of address bits, generally called the tag bits, is then compared against a set of tag bits corresponding to a selected entry and a hit declared if the set of bits from the second set of address bits match the selected set of tag bits.
Programs running on modern processors normally access (read or write) memory using virtual addresses that differ from the physical address of the data in memory. The use of virtual addresses greatly simplifies running multiple programs by allowing them to view a continuous memory space unaffected by allocation to other processes, allowing physical memory to be allocated only to active virtual addresses and preventing the corruption of memory space from one process by the operation of another process, as is known in the art.
The use of virtual address space, despite its advantages, increases the delay in accessing memory by requiring a translation from virtual address space used by the programs to physical address space required by the computer memory. Normally this translation is done by means of page tables having entries that cross-reference each virtual address space to the physical address space. Normally a page table entry may also contain access permissions for data in the corresponding virtual page. The page tables may be augmented by a translation lookaside buffer (TLB) that serves to cache recently accessed entries from the page table to speed up the process of translating the addresses and checking the requisite access permissions. The TLB may be optimized for low access latency and for low miss rate by employing a fast, highly associative structure.
In a system with virtual memory, the cache memory is nevertheless normally accessed by physical memory addresses, that is, the first and second set of bits of address (e.g., index and tag) used to access the data in the cache are both part of a same physical address While the latency of the address translation using the TLB and page tables is tolerable for the main memory, it is more burdensome when used with a cache memory which is intended to provide frequent and rapid access that can be significantly slowed by address translation. Furthermore, the translation with a highly associative TLB will be energy hungry.
Ideally, the cache could be accessed directly using a virtual address from a program, that is, where the first and second set of bits (e.g. index and tag) used to access the data in the cache are both parts of the same virtual addresses. The use of a virtual address to access cached data can obviate the latency and energy overhead resulting from TLB lookups; however it is complicated by the possibility of synonyms that is a group of distinct virtual addresses mapped to the same physical address. This aliasing (or overlapping) is possible and desirable to efficiently manage data in (physical or main) memory, for example, shared information across different processes with distinct virtual address spaces. In such cases, the use of virtual addressing for the cache could permit multiple different cache entries mapping to the same physical address (synonym virtual addresses), that is, holding the same data under distinct virtual addresses. Allowing the duplicates reduces the cache capacity. This also presents a consistency problem among them. For example, if one process updates data associated with the common physical address using the first cache location, then a second process cannot read the up-to-date value for the common physical address using the second cache location.
One solution to this problem, inherent in caches using virtual addresses for cache access, is to prohibit virtual address aliasing (or overlapping) in the physical address domain. For example, one can prevent data from a single physical address from being cached with different virtual addresses in a cache. This solution greatly decreases the ability of the cache to exploit locality of reference to the data. Similarly, one can prevent data from the same physical page from being cached with different virtual addresses in a cache, especially if the data could be written to. Alternatively, one can employ a single global virtual address space that eliminates the occurrence of synonyms itself Each of these solutions places large demands on software, which greatly limits its practical utility.
Another commonly used solution is what is referred to as a virtually-indexed physically-tagged (VIPT) cache in the prior art. Here a first set of bits used to index into the cache to select an entry is part of a virtual address and a second set of bits used to compare against the tag bits of the selected entry is part of a physical address corresponding to the virtual, address. This solution exploits the observation that some low-order bits of a virtual address (the page offset bits) do not change as a result of the address translation. Thus these low-order bits of the virtual address (which are the same as that of the corresponding physical address) can be used to index into the cache and start the access of the data and tag bits residing in the corresponding entry. In parallel the TLB is accessed to obtain the physical address. When both operations have completed, the physical address bits obtained from the TLB access are compared with the physical address bits stored in the cache tags, and a hit declared if they match. This approach may decrease the latency of the cache access, since the TLB is accessed in parallel with, rather than prior to, the cache access. However, the TLB access energy is still expended and thus the energy benefits of a cache accessed with virtual addresses is not obtained. Moreover, the limits placed on the number of bits used to index the cache (these bits should not change as a result of the address translation) restricts the organization of the cache, potentially requiring a higher degree of associativity than may be desirable to achieve energy efficiency.
What is desirable is to have a cache that can be accessed with a virtual address so that a first address used to index into the cache and select an entry and a second address used to compare against the selected tag bits and to declare a hit are both parts of the same virtual address. Thus a data access that hits in the cache can be completed solely using a virtual address, without the need to access the TLB or perform a virtual to physical address translation. Such a cache would have significant access latency and/or energy consumption advantages over designs that employed a physical address to complete the cache access. Moreover, it is also desirable that the operation of such a cache be transparent to software and that no requirements be placed on software to ensure its correct operation.