1. Field of the Invention
This invention relates to computer systems and in particular to systems which allow software pre-fetch data from memory and alter cache state.
2. Description of Background
In a multiprocessing system where a consistent memory usage model is required, memory usage among different processors is managed using cache coherency ownership schemes. These schemes usually involve various ownership states to a cache line. Preferably, these states include read-only (commonly known as shared or fetch access ownership), and exclusive (where a certain processor has the sole and explicit update rights to the cache line, sometimes known as store access ownership).
For one such protocol used for a strongly-ordered memory consistency model, as in IBM's z/Architecture® implemented by IBM System z processors, when a processor is requesting rights to update a line, e.g. when it is executing a “Store” instruction, it will check its local cache (L1) for the line's ownership state. If the processor finds out that the line is either currently shared or is not in its cache at all, it will then send an “exclusive ownership request” to the storage controller (SC), which serves as a central coherency manager. The IBM® z/Architecture® is described in the z/Architecture Principles of Operation SA22-7832-05 published April, 2007 by IBM and is incorporated by reference herein in its entirety.
U.S. patent application Ser. No. 11/954,374 “METHOD AND APPARATUS FOR ACTIVE SOFTWARE DISOWN OF CACHE LINES EXCLUSIVE RIGHTS” by IBM filed concurrently with the present application is incorporated by reference herein in its entirety.
U.S. Pat. No. 5,623,632 “System and method for improving multilevel cache performance in a multiprocessing system” from IBM, filed May 15, 1995, incorporated herein by reference, describes a multiprocessor system having a plurality of bus devices coupled to a storage device via a bus, wherein the plurality of bus devices have a snoop capability, and wherein the plurality of bus devices have first and second caches, and wherein the plurality of bus devices utilize a modified MESI data coherency protocol. The system provides for reading of a data portion from the storage device into one of the plurality of bus devices, wherein the first cache associated with the one of the plurality of bus devices associates a special exclusive state with the data portion, and wherein the second cache associated with the one of the plurality of bus devices associates an exclusive state with the data portion. A bus device initiating, a write-back operation with respect to the data portion, determining if there are any pending snoops in the second cache, and changing the special exclusive state to a modified state if there are no pending snoops in the second cache. If there is a pending snoop in the second cache, a comparing of addresses of the pending snoop and the data portion is performed. The special exclusive state is changed to a modified state if the addresses are different. The special exclusive state indicates that a data portion is held in the primary cache in a shared state and that the data portion is held in the secondary in an exclusive state.
In one embodiment, a storage controller (SC) tracks which processor, if any, currently owns a line exclusively. If deemed necessary, the storage controller (SC) will then send a “cross interrogate” (XI) or “ownership change” request to another processor which currently owns that line to release its exclusive rights. In this embodiment, a cross interrogate (XI) is referred to as “cross invalidate” since the action may invalidate the line in the other processor cache. Once the current owning processor has responded to the XI and responded that the exclusive ownership is released, the requesting processor will then be given exclusive update rights to the line requested.
In a large SMP (Symmetric Multi-Processing) system, it is common that various processes running on different processors update the same cache lines, but at different times. When a line is updated by one process, and then another process starts up, updating the same line by the one process will encounter delays required for XI acknowledgement while exchanging exclusive ownerships from one processor to another. These delays amount to a significant performance degradation as number of processes goes up that reuse the same cache lines.
A program application would, of course, know whether a particular data object (cache line) it had stored to would be needed again in the near future by the program. Such a program may desire to release the cache line associated with the store in order to improve performance in a multi-processor environment, however prior to the present invention, this was not possible.