The present invention relates to data processing and, more particularly, to improving data processing system performance by decreasing the data handoff interval in a multiprocessor data processing system based on an early indication of a systemwide coherence response.
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data, and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of shared memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level vertical cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Because multiple processor cores may request write access to a same memory block (e.g., cache line or sector) and because cached memory blocks that are modified are not immediately synchronized with system memory, the cache hierarchies of multiprocessor computer systems typically implement a cache coherence protocol to ensure at least a minimum required level of coherence among the various processor core's “views” of the contents of system memory. The minimum required level of coherence is determined by the selected memory consistency model, which defines rules for the apparent ordering and visibility of updates to the distributed shared memory. In all memory consistency models in the continuum between weak consistency models and strong consistency models, cache coherency requires, at a minimum, that after a processing unit accesses a copy of a memory block and subsequently accesses an updated copy of the memory block, the processing unit cannot again access the old (“stale”) copy of the memory block.
A cache coherence protocol typically defines a set of coherence states stored in association with cached copies of memory blocks, as well as the events triggering transitions between the coherence states and the coherence states to which transitions are made. Coherence protocols can generally be classified as directory-based or snoop-based protocols. In directory-based coherence protocols, a common central directory maintains coherence by controlling accesses to memory blocks by the caches and by updating or invalidating copies of the memory blocks held in the various caches. Snoop-based coherence protocols, on the other hand, implement a distributed design paradigm in which each cache maintains a private directory of its contents, monitors (“snoops”) the system interconnect for memory access requests targeting memory blocks held in the cache, and responds to the memory access requests by updating its private directory, and if required, by transmitting coherence message(s) and/or its copy of the memory block.
The cache states of the coherence protocol can include, for example, those of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol or a variant thereof. The MESI protocol allows a cache line of data to be associated with one of four states: “M” (Modified), “E” (Exclusive), “S” (Shared), or “I” (Invalid). The Modified state indicates that a memory block is valid only in the cache holding the Modified memory block and that the memory block is not consistent with system memory. The Exclusive state indicates that the associated memory block is consistent with system memory and that the associated cache is the only cache in the data processing system that holds the associated memory block. The Shared state indicates that the associated memory block is resident in the associated cache and possibly one or more other caches and that all of the copies of the memory block are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
In snoop-based coherence protocols, it is common for caches to respond to a request snooped on the interconnect by providing an individual coherence response. These individual coherence responses are then combined or otherwise processed to determine a final systemwide coherence response for the request, which can indicate, for example, whether or not the request will be permitted to succeed or will have to be retried, a data source responsible for supplying to the requesting cache a target cache line of data identified in the request, a coherence state of the target cache line at one or more caches following the request, etc. In a conventional data processing system employing a snoop-based coherence protocol, the minimum handoff interval at which a cache line of data can be sourced (intervened) from a cache in a vertical cache hierarchy supporting one processor core to another cache in a different vertical cache hierarchy supporting another processor core via the system interconnect is the time between when a request is issued by a cache and the systemwide coherence response is received by that cache.