1. Technical Field
The present invention relates in general to data processing and, in particular, to memory access in a data processing system. Still more particularly, the present invention relates to a data processing system and method of communication that reduce latency of write transactions subject to retry.
2. Description of the Related Art
A generalized data processing system architecture includes a system memory, a plurality of snoopers, and an interconnect coupling the plurality of snoopers to the system memory to permit read and write access. In many data processing system implementations, at least one of the snoopers, for example, a processor, has one or more associated caches for storing data and/or instructions (hereinafter, both referred to as data) at relatively low access latency as compared to the system memory. For example, access by a processor to an associated cache may take on the order of ones or tens of processor cycles, while access to the system memory via the interconnect may require hundreds of processor cycles.
In data processing system implementations in which snoopers cache data, it is essential for proper operation that a single view of the contents of memory is provided to all of the snoopers, that is, that a coherent memory hierarchy be maintained. A coherent memory hierarchy is maintained through the implementation of a cache coherency protocol that specifies the caching behavior implemented by the snoopers and a communication protocol that specifies the snoop responses snoopers are required to provide to memory access requests snooped on the interconnect.
According to a typical communication protocol, each snooper provides a snoop response to each memory access request snooped on the interconnect. For example, if a snooper receives a request for cached data, the snooper provides a Shared snoop response if the data are cached non-exclusively and are coherent with corresponding data in the system memory. Similarly, the snooper provides a Modified snoop response if the snooper""s cache holds a copy of the requested data that is modified with respect to corresponding data in the system memory. After all of the snoopers have provided a snoop response, the coherency responses of the snoopers are compiled to create a so-called xe2x80x9ccombined responsexe2x80x9d that determines the manner in which the memory access request will be serviced by the data processing system.
Occasionally, a snooper may not be able to process a snooped memory access request in a timely manner. For example, the snooper may lack sufficient resources (e.g., queues) to check the cache directory for the address specified by the memory access request. In such cases, the snooper provides a Retry snoop response to indicate the inability to process the transaction. If any of the snoopers provides a Retry snoop response to a snooped memory access request, the combined response for the request is generally also Retry, meaning that the transaction cannot be completed at the current time. Thus, to obtain service for the memory access request, the requesting snooper must again transmit the memory access request on the interconnect, in hopes that the condition causing the Retry has been resolved (e.g., a queue has become available). In general, the requesting snooper continues retrying the request until the request is ultimately serviced.
In data processing systems having a small number of snoopers, the request/Retry methodology outlined above works reasonably well in that the probability that any particular transaction will receive a Retry combined response is relatively low. However, as the number of snoopers scales (e.g., in large symmetric multiprocessor (SMP) systems), the probability that a request will receive a Retry combined response concomitantly increases. Thus, in large-scale cache coherent data processing systems, memory access requests may be subject to unacceptably large latency, thereby diminishing overall system performance.
The present invention appreciates that in the conventional request/Retry scenario described above, the delay (or latency) in servicing a memory access request can advantageously be reduced by modification of the behavior of snoopers in the event a Retry combined response.
In accordance with the present invention, a data processing system includes a plurality of snoopers coupled to an interconnect. In response to a memory access request transmitted on an interconnect by one of the snoopers receiving a Retry response, a determination is made whether or not the Retry response was caused by a target snooper that will service the memory access request. If not, the target snooper services the memory access request in spite of the Retry response. In a preferred embodiment in which the memory access request is a write request and the target snooper is a memory controller, stale data cached by at least one snooper in association with the address are also invalidated by a snooper, such as the memory controller, transmitting at least one address-only kill transaction on the interconnect. Advantageously, the address-only kill transaction can be issued concurrently with or following servicing the write request so that the write request does not incur latency by waiting until all stale copies of the data have been invalidated.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.