This application is related to, and hereby incorporates by reference, the following U.S. patent applications:
Multiprocessor Cache Coherence System And Method in Which Processor Nodes And Input/output Nodes Are Equal Participants, Ser. No. 09/878,984, filed Jun. 11, 2001;
Scalable Multiprocessor System And Cache Coherence Method, Ser. No. 09/878,982, filed Jun. 11, 2001;
System and Method for Daisy Chaining Cache Invalidation Requests in a Shared-memory Multiprocessor System, Ser. No. 09/878,985, filed Jun. 11, 2001;
Cache Coherence Protocol Engine And Method For Processing Memory Transaction in Distinct Address Subsets During Interleaved Time Periods in a Multiprocessor System, Ser. No. 09/878,983, filed Jun. 11, 2001;
System And Method For Generating Cache Coherence Directory Entries And Error Correction Codes in a Multiprocessor System, Ser. No. 09/972,477, filed Oct. 5, 2001, which claims priority on U.S. provisional patent application 60/238,330, filed Oct. 5, 2000, which is also hereby incorporated by reference in its entirety.
The present invention relates generally to the design of cache memories in computer central processor units (CPU""s), and particularly to the organization of two-level CPU caching systems in which the first-level cache is virtually indexed.
The present invention is applicable to both single processor and multi-processor computer systems, but will be described primarily in the context of a distributed multi-processor system.
An xe2x80x9cindex positionxe2x80x9d within a cache identifies one or more cache lines within the cache. The number of cache lines stored at each index position is called the associativity of the cache. A direct mapped cache has an associativity of one. A two-way associative cache has an associativity of two, and thus has two cache lines at each index position of the cache.
A xe2x80x9cmemory line,xe2x80x9d also often called a cache line, is the basic unit of storage that is stored in a memory cache. A memory line or cache line is also the basic unit of storage that is tracked by the cache coherence logic in multi-processor computer systems. A memory line of data will often be called a xe2x80x9cmemory linexe2x80x9d while it is stored in main memory or is in transit between system nodes, and the same data may also be called a cache line while it is stored in a memory cache.
When a first-level (L1) cache is virtually indexed the xe2x80x9ccache index bitsxe2x80x9d within the virtual address supplied by a processor are used to retrieve a tag from a tag array within the cache. Virtual indexing of a first-level (L1) cache allows the lookup of the L1 cache tag to proceed concurrently with the translation of the requested virtual memory address into a physical memory address, sometimes herein called the targeted physical memory address. The virtual to physical address translation is performed by a translation look-aside buffer (xe2x80x9cTLBxe2x80x9d). The tag from the cache is then compared to the targeted physical memory address obtained from the TLB, and if there is a match and the cache state for the cache line is not xe2x80x9cinvalidxe2x80x9d (which together indicate a cache hit), the data from the cache that corresponds to the tag is sent to the processor. If there is a miss, meaning that the retrieved tag did not match the physical address obtained from the TLB, the requested cache line of data must be obtained from a second-level cache or main memory.
While virtual indexing speeds up the lookup of a cache, it also may give rise to the possibility of synonyms. Synonyms are cache lines at different cache indices that map to the same physical memory address, and therefore refer to the same data entry. Synonyms may arise when a physical memory address is shared between two or more different programs or different parts of the same program, which may access it with two or more different virtual addresses. If the size of the cache divided by its associativity is greater than the size of the memory pages used in the system, a memory line at any given physical memory address can be stored at more than one index position within the cache. More specifically, the number N of cache line index positions at which any memory line may be found within the cache is equal to:   N  =            cache      ⁢              xe2x80x83            ⁢      size              associativity      xc3x97      pagesize      
Having more than one cache index position correspond to the same physical memory address can give rise to a memory coherence problem if the data entry for one virtual memory address is changed without changing the data for another virtual memory address that maps to the same physical memory address. It is therefore necessary to either prevent synonyms from occurring or else to detect and resolve synonyms before they give rise to a memory coherence problem.
In addition, in the context of a shared memory multi-processor computer system with multiple first-level caches, it is also necessary to ensure that the cache coherence logic handling a request for a particular physical memory address be able to find any and all copies of the corresponding memory line, including those in first-level caches, even though there may be multiple L1 cache index positions at which the identified memory line may be stored within any particular L1 cache.
Since synonyms are only possible if the size of the first-level cache divided by its associativity is larger than the size of the system""s memory pages, synonyms may be avoided by decreasing the size of the cache, increasing associativity, or increasing the size of the memory pages. Unfortunately, decreasing the size of the first-level cache reduces system performance, because it increases the number of cache misses. Increasing associativity greatly increase the complexity, and thus cost, of the L1 caches, and may also reduce system performance by increasing the time required retrieve a cache line from the L1 cache. Increasing the size of the system""s memory pages is often not practical, because memory pages are the basic unit of memory used for many tasks, including memory allocation to processes, disk transfers and virtual memory management.
Alternatively, synonyms may be avoided at the system or kernel software level by restricting the assignment of virtual addresses by increasing the number of least significant address bits of the virtual addresses that must match the corresponding physical address. As a result of this restricted allocation of virtual addresses, all virtual addresses that correspond to a particular physical address will always have the same L1 cache index. This last method of avoiding synonyms places a burden on system software policies and on the usage of virtual address spaces.
A possible method of resolving the problem of L1 cache synonyms that was considered by the inventors, but rejected for reasons described next, is to build logic into the L1 cache for detecting synonyms and resolving them. When an L1 cache miss occurs, the logic would search for a synonym within the L1 cache and abort the miss if a synonym is found. The cache line would then be copied from the location where the synonym was found to the location where the miss occurred, and the cache line at the original location would be invalidated. The main disadvantage of this method is that it would cause the first-level cache to be kept busy after every cache miss, while the first-level cache is searched for synonyms. Most of the time a synonym will not be found, however, because synonyms are rare in practice. Searching the first-level cache for synonyms after every miss reduces system performance by increasing the amount of time between cache requests by the processor coupled to the L1 cache, and potentially reduces system performance by delaying the resolution of other subsequent L1 cache accesses. In addition, in multiprocessor systems, this technique may reduce system performance by decreasing the amount of time that the cache is available for responding to cache coherence protocol requests. The impact on system performance may be especially severe for processor cores that aggressively exploit instruction level parallelism and which therefore could tolerate the latency of a first-level cache miss. An additional disadvantage of searching the first-level cache for synonyms after a miss occurs is that it delays the initiation of the search of the second-level cache for the tag, unless the search of the second-level cache is done concurrently with the search of the first-level cache for synonyms. Searching the first-level and second-level caches concurrently requires a complex handshake between the first-level and second-level caches, because it introduces the need to abort the second-level search if a synonym is found in the first-level cache. The handshake is particularly difficult when several first-level caches share a single second-level cache, as is the case in some single chip multiprocessor (xe2x80x9cCMPxe2x80x9d) systems.
L1 cache synonyms in a two-level cache memory system are detected and resolved using duplicate tags and detection logic in the second-level (L2) cache, rather than in the first-level (L1) cache. Duplicate copies of all the first-level cache tags and state (xe2x80x9cDtagsxe2x80x9d) are maintained in the second-level cache. When a miss occurs in the first-level cache, an L1 miss message is sent to the L2 cache. The Dtags in the L2 cache that correspond to all possible synonym locations in the first-level cache are searched for a synonym of the requested cache line. By definition, a synonym of a requested cache line has the same physical address as the requested cache line, but is stored in at a different cache index of the L1 cache than the index corresponding to the specified virtual address of the cache line.
The L1 cache index is typically determined by a predefined subset of the cache line""s virtual address bits. Furthermore one or more of the most significant bits of the L1 cache index are herein called the xe2x80x9cvpn bitsxe2x80x9d of the virtual address. The number of vpn bits, M, is equal to:   M  =      ceiling    ⁡          (                        Log          2                ⁡                  (                                    cache              ⁢                              xe2x80x83                            ⁢              size                                      associativity              xc3x97              pagesize                                )                    )      
where the xe2x80x9cceilingxe2x80x9d function rounds up the value to which the ceiling function is applied to the closest integer if that value is not already an integer. The vpn bits of a virtual address identify which one of the possible N synonym cache index positions in the L1 cache corresponds to the virtual address. The other N-1 synonym cache index positions have the identical cache index value, except for the M most significant bits thereof.
The L1 cache miss request to the L2 cache specifies both the physical address of the requested cache line and the vpn bits of the specified virtual address so that the second-level cache can search the appropriate N-1 Dtag entries for a synonym having a tag matching the requested physical memory address but a different L1 cache index.
If a synonym is found, the second-level cache aborts the miss and notifies the first-level cache where the requested cache block can be found in the first-level cache. The first-level cache then copies the cache line from the location where the synonym was found to the location where the miss occurred, and it invalidates the cache line at the original location. Finally, the Dtags in the second-level cache are updated to reflect the changes made in the first-level cache.