This invention relates generally to maintaining storage coherency in a system with multi-level private caches and, more particularly, to a method, system, and computer program product for cross-invalidation handling in a multi-level private cache.
In a multiprocessing system where a strongly consistent memory image is required, as in z/Architecture™ implemented by IBM® System z processors, memory usage among different processors is managed using cache coherency ownership schemes. These schemes usually involve having an indication of which processor currently has the “exclusive” right to update a cache line. For one of such protocols, when a processor is requesting rights to update a line, it will check its local cache (L1) for the line's ownership state. If needed, it will then send an “exclusive ownership request” to the higher level cache controller which tracks which processor, if any, currently owns a line exclusively, and which processor, if any, currently owns the line read-only. The cache controller will then send a “cross invalidate” or “ownership change” request to the processor which currently owns that line exclusive to release its exclusive rights; or to the processors having read-only access to stop using that line. Once the owning processor has responded that the exclusive ownership is released, the requesting processor will then be given exclusive rights to the line. Once it has relinquished exclusive rights to a line, that processor must re-request the exclusive ownership to that same line from the higher level memory controller before it can perform any additional updates to that line. Similarly, once a “read-only” processor has received a invalidate request, it need to ensure such line's usage is still consistent with the coherency requirement.
A typical example:
Processor #1 Processor #2
(1.1) change A (2.1) inspect B
(1.2) change B (2.2) inspect A
When processor#2 observed B as a new value from processor#1's store, it also has to observe A as being the new value updated by processor#1, since the store to A is logically earlier than the store to B while processed in processor #1.
This means when B's value is received in processor#2, all previously communicated XI have to be effective to prevent A's old value from being observed in processor#2.
In a system where there is a hierarchy of private caches between a level-1 cache and a storage controller, e.g., a level-2 level cache, it is desirable that XI be first looked up in the level-2 cache, and level-1 cache will only be interrupted for XI processing if that concerned line still exists in level-2 cache. Most times, the line is already replaced by another line (note that it is known to those skilled in the art that the level-2 cache will maintain a subset of level-1 cache, as will become relevant in the following discussion).
On the other hand, if the storage controller also contains, e.g., the level-3 cache, then any level-1 and level-2 misses will want any data returned to be forwarded to the level-1 cache and, thus, the processor as soon as possible, including bypassing the level-2 cache.
In a multiprocessor system with private L1/L2 and shared L3 caches where L3 is the intended cache coherency manager (storage controller), when processor P1 wants to store to line X while processor P2 already has ownership of it, a cross-invalidate (XI) is sent from the L3 cache to L2 of processor P2 to prevent it from using the potentially out-of-date data.
Due to physical latency and other aspects of the microarchitecture, the XI might not get to the L1 of the processor P2 in time to invalidate the L1 cache to maintain cache coherency. This could be because of delays in communication, or the fact that the XI is still in waiting for priority in L2. What is needed is a way to make sure processor P2 does not use the old data for line X after it has used the new data for line Y where line Y was stored to later than line X by another processor (as shown in example above). This can happen if processor P2 misses line Y and gets the miss data from the L3 cache (possibly bypassing through the L2) before the XI for line X gets processed through the L2. Since line X already resides in P2's L1 cache, it can potentially access it without knowing that it is out-of-date.
Moreover, sometimes the higher level caches are composed of a parallel pipe design in which two or more concurrently operating caches operate on various cache operations based on address bit partitions. A simple example is a 2-way parallel design in which one cache operates on odd cache lines and another operates on even cache lines. This further introduces/aggravates the “XI handling” vs. cache miss usage when XI is coming from one pipe while a cache miss is responded to from another.