This invention relates generally to multi-processor environments, and more particularly to handling shared cache lines to allow forward progress among processors in a multi-processor environment.
In a multiprocessing system where a consistent memory usage model is required, memory usage among different processors is managed using cache coherency ownership schemes. The schemes usually involve various ownership states for a cache line. The states include read-only (commonly known as shared) and exclusive (where a certain processor has the sole and explicit update rights to the cache line, sometimes known as store access).
For one such protocol used for a strongly-ordered memory consistency model, as in IBM's z/Architecture implemented by IBM System z processors, when a processor is requesting rights to update a line, e.g., when it is executing a “Store” instruction, the processor checks local cache (L1) for the line's ownership state. If the processor discovers that the line is either currently shared or is not in its cache at all, the processor sends an “exclusive ownership request” to a storage controller (SC) which serves as a central coherency manager.
The SC tracks which processor, if any, currently owns the line exclusively. If deemed necessary, the SC will then send a specific “cross interrogate” (XI) or “ownership change” request to another processor which currently owns that line to release its exclusive rights. The XI is usually called an “exclusive XI”. Once the processor that currently owns the line has responded to the XI and responded that the exclusive ownership is released, the requesting processor is then given exclusive update rights to the line requested.
It is also possible that the SC may find that one or more processors currently have the requested line in read-only (or shared) state. The SC informs the requesting processors through the XI interface indicating that the line is about to be changed. The requesting processors' L1 logic ensures that data which currently exists in their caches is no longer consumed.
In a large SMP (Symmetric Multi-Processing) system, it is common that various processes running on different processors, or different threads within a processor, update or use the same cache lines, at similar times. When a process running on one processor references or updates a line that is currently owned exclusively by another processor, the owning processor must acknowledge the exclusive XI and relinquish exclusive ownership before the first processor can access that line.
In some implementations a processor may reject an exclusive XI request and retain exclusive access to that line, in which case the SC reprioritizes its pending requesters and resends the exclusive XI at a later time. In this case, it is important that the owning processor cannot retain exclusive access to that line indefinitely, such that the other processors cannot be given rights to update or use the line and end up not making forward progress, a condition known as a “live-lock.” The live-lock situation can result from a variety of situations in the owning processor, including a long stream of updates to the line or a prefetch mechanism, which continually anticipates a need for exclusive access to the line.
In some prior processor designs, a processor is prevented from creating such a live-lock situation by requiring that it give up exclusive rights to a line as soon as possible after rejecting an exclusive XI, delaying this only until any pending updates are communicated to the memory controller (including any local caches).
In particular, live-lock is avoided by having an internal mechanism in the processor's cache control logic, which actively invalidates the line that had been the subject of a rejected XI as soon as possible. The mechanism may work as follows: when an exclusive XI is rejected, the address of the XI is saved in a register (“XI-save”); at the same time a record is made of all pending instructions within the processor. Any new instructions from this point on that request exclusive access to the same line as in the XI-save register is rejected. Once all instructions which were pending at the time of the XI reject have been completed, the processor invalidates the cache line corresponding to the address in the XI-save register. Following the invalidation, the XI-save register is reset and no longer inhibits access to the line by subsequent instructions; the next such access will miss the cache (since the line has been invalidated) and cause a new request to be sent to the SC. By actively invalidating the line, the owning processor guarantees that the repeated XI invalidate from the SC will be honored (not rejected). Even though this processor might be re-requesting the same line after the XI-save invalidation, the priority inside the SC ensures that the processor which had requested the line earlier gets access to the line first.
This traditional design allows forward progress in all processors, but may not yield optimal performance. In particular, if a program on a processor currently owning exclusive rights to a line is in the midst of a (short but not seemingly endless) sequence of updates to that line when it receives the invalidation request, it will immediately need to re-acquire exclusive ownership of that line. Because of latency involved in transferring ownership among processors, this results in all of the processors involved in the contention spending extra time waiting. In addition, the resulting traffic on the multiprocessor coherence fabric can impact other processors in the SMP system.
This mechanism also has the drawback of requiring complex control sequencing, with significant inter-dependencies between the processor, its local cache and SC designs to insure correct operation in all cases. Thus a simpler and more flexible design that can avoid a live-lock is desired.