This invention relates generally to cache memory hierarchy, and more particularly to providing handling for cache synonyms for cache memory hierarchies.
A cache memory, or cache, is a high speed memory positioned between a processor and main storage, to hold recently accessed main storage data. Whenever data in storage is accessed, it is first determined whether or not the data is in the cache and, if so, it is accessed from the cache. If the data is not in the cache, then the data is obtained from the main storage and the data is also stored in the cache, usually replacing other data which had been stored in the cache memory.
A cache hierarchy may exist, where multiple levels of cache exist between the processor and main storage. As one gets farther away from the processor, each cache gets larger, slower and cheaper. The cache closest to the processor is called the L1 cache, the next-closest cache is called the L2 cache, and so on. One processor may have multiple L1 caches, such as one L1 cache for data/operands and one L1 cache for instructions. One L2 cache may be connected to multiple L1 caches, where the L1 caches are either for the same processor, or for multiple processors in a multi-processor (mp) system.
In a virtual memory system, a memory access issued by an instruction is usually a va (virtual address, or logical address, or effective address) known to the associated program. The ra (real address, or absolute address, or physical address) in main memory associated with a va can be determined through the translation process. The translation process is a multi-cycle multi-step process that involves table lookups to get the ra.
To speed up the translation, a tlb (translation lookaside buffer, also known as dlat or erat) is used. A tlb holds the va and corresponding ra for recent translations. Depending on architectural requirements, the tlb may need more fields than just the va and corresponding ra.
The portion of an address that is subject to translation is known as a page. A cache has a corresponding directory array which holds the addresses of the data currently in the cache. Each address corresponds to a unit of storage called a line. The address that is stored within a directory array entry is called a tag.
When a fetch request is sent from the core (processor core) to the L1 cache, the fetch's address is compared against the directory, to see if the corresponding data is in the cache. The range of address bits that are used to address the directory is called a directory index or congruence class. A congruence class value may read out data for one or more lines, depending on whether the directory/cache is direct mapped (one way set associative) or greater than one way set associative. A direct mapped cache only accesses one line per congruence class. For example, a four way set associative cache accesses 4 lines per congruence class. For associativity greater than one, each of the lines being read in parallel is called a set (or setid, or way, or compartment, where setid means the identification or label or name given to each set).
For associativity greater than one, when fetch data is returned from the next level of cache hierarchy, output from an lru array determines which of the setid's the data should be written in. Lru means least-recently-used. The idea is to put the data in a setid that hasn't been referenced recently, to help performance. There are various approaches for lru algorithms. If the setid where the fetch data will be written already has a valid line of data in it, then when that line is written over, that is called lru-ing out the line. For associativity greater than one, the directory compare results (one compare per setid) are used to multiplexer-down the cache output, to select the setid of interest. These cache multiplexer controls are called the late selects. Because accessing the tlb and directory arrays and then waiting to use their compare results as late selects to multiplexer-down the cache output can possibly lengthen a processor pipeline or cycle time, sometimes another array (in addition to the directory array) is used to create the late selects. This array can be called a set predict array. One approach for a set predict array is to structure it like a directory, with multiple setid's and compares, but only implement a subset of tag bits.
Another approach for a set predict array is to not have any compares, but instead use the array output directly as the late selects. If a set predict array is used, its result must be compared to the result from the directory, to verify that the set predict array predicted correctly. When data for a particular fetch request is returned from an L1 cache read to the core, or data for a store request is written into the L1 cache from the core, the amount of data written/read is usually less than a line, with possibilities such as a hw (halfword), wd (word), dw (doubleword), qw (quadword) or ow (octword). For caches over a certain size, the cache and directory index includes bits that are subject to translation. The invention only applies to this case. For such a case, the pair of arrays either use va bits or ra bits for those bits. If va bits are used, then the possibility of synonyms exists.
In general, a synonym (or alias) occurs when two different va's map to the same ra. The subclass of synonyms that apply to the invention are cases where the subset of virtual address bits used to index the L1 cache have different values for the 2 synonyms. When the terms ‘synonym’ or ‘cache synonym’ are used in this description, they will be referring to this subclass. The terms ‘synonym bits’ or ‘va syn’ refer to this subset of virtual address bits.
For a directory that is virtual-address-indexed, the tag field in the directory array may be either a va or an ra. For an ra tag directory, the directory output must be compared against the ra output from the tlb. For a va tag directory, the directory output can be compared directly to the va from the core. A va tag directory acts like a combination tlb and ra tag directory. If architecture requires the tlb to compare on more fields than just the va, in order to know whether a given translation is valid, then these other fields may also be needed in a va tag directory. Normally, a given va maps to only one corresponding ra at a time. However, bad programming could result in one va mapping to multiple ra's at the same time. If this happens, a va tag directory could use a translation that is different than the tlb. Depending on architectural requirements, detection of this case may be used to: detect an error, attempt to clean up this unusual condition, or take no special action.
Cache coherency involves making sure that the storage image across the cache hierarchy is consistent. One approach to cache coherency involves exclusivity. A line of data can only be stored to when it is held exclusive by one L1 cache. However, a line can be held read-only by several caches. In a cache hierarchy, a given level of cache can track exclusive/read only ownership of the lines in the caches one level below, as long as the caches one level below only contain a subset of the lines that are in the cache level doing the tracking.
When an L1 cache sends a fetch request to L2, command codepoints say whether the fetch is for read-only, exclusive, or cex (conditional-exclusive) ownership of the line. Cex means the line may or may not be returned with exclusive ownership.
If, for example, one L1 cache sends a fetch exclusive to the L2 cache, and the L2 cache's directory indicates that another L1 cache connected to that L2 currently has that line, the L2 sends an xi (cross-interrogate) invalidate to that other L1 cache. The other L1 cache searches its directory for the xi. If the line is in the directory, then it is invalidated.
As another example, if one L1 cache sends a fetch read-only to the L2, and the L2 cache's directory indicates that another L1 cache currently has that line exclusive, then the L2 sends an xi demote to that other L1 cache. The other L1 cache searches its directory for the xi. If the line is in the directory, then the exclusive bit in that L1 directory is turned off, but the L1 directory's valid bit remains on.
In terms of how stores are implemented for the subset of L1 caches that are stored-to, there are two main approaches. For a store-thru or write-thru cache, when store data is written into the cache, the store data is also forwarded to the next-higher level of cache hierarchy around the same time, with the granularity of data usually being less than a line: for example: a hw, wd, dw, qw, or ow. For a store-in or write-back cache, the store data isn't sent to the next cache level immediately. Instead, the data only gets sent when the line is about to be lru'ed-out of the cache, or the next level of cache hierarchy is requesting that data. For example, if the L2 cache sent a store-in L1 cache a demote xi, then at that point, the L1 cache would send the data to the L2 cache. The data transfer would typically be a multi-cycle transfer for the full line, regardless of how much of the line was stored-to. For a store-in cache, the cache directory includes a status bit that says whether the line was stored-to, to know when such a data transfer to the next cache level is needed.
When an L1 cache that is stored-to receives a demote or invalidate xi for a line that the L1 cache currently has exclusive, and a store-thru L1 cache is working on storing to that line, or a store-in cache is working on storing to that line or has stored to that line, the L1 cache can not give up exclusivity of that line until the store data has been sent to the next level of cache hierarchy. One approach for this case is for the L1 cache to delay telling the next level of cache hierarchy that the xi is done, until the stores have been sent.
Another approach for this case is for the L1 cache to reject the xi to the next level of cache hierarchy, and have the xi be repeatedly sent to the L1 cache until it is no longer rejected.
It would be desirable/advantageous to be able to resolve synonym conflicts while maintaining cache coherency in a cache hierarchy.