1. Field of the Invention
The present invention relates to a data processing apparatus and method of managing coherency of cached data.
2. Description of the Prior Art
It is known to provide multiple processing elements within a data processing system, for example multiple processor cores, or a mixture of processor cores and other components such as a graphics processing unit, a direct memory access (DMA) controller, an input/output agent, etc. It is also known to provide various of those processing elements with their own dedicated cache structures, so as to increase speed of data access for those processing elements, and hence improve the overall performance of the data processing system. Processing elements with their own dedicated cache structures will be referred to herein as caching nodes.
However, when a data processing system has multiple such caching nodes, this complicates the issue of data coherency. In particular, it will be appreciated that if a particular caching node performs a write operation with regards to a data value held in its local cache, that data value will be updated locally within the cache, but may not necessarily also be updated at the same time in any lower level of the memory hierarchy, such as a shared level of cache or shared memory. As an example, if the data value in question relates to a write back region of memory, then the updated data value in the cache will only be stored back to the lower level of the memory hierarchy when that data value is subsequently evicted from the local cache.
Since the data may be shared with other caching nodes, it is important to ensure that those caching nodes will access the up-to-date data when seeking to access the associated address in shared memory. To ensure that this happens, it is known to employ a cache coherency protocol within the data processing system to ensure that if a particular caching node updates a data value held in its local cache, that up-to-date data will be made available to any other caching node subsequently requesting access to that data.
The use of such cache coherency protocols can also give rise to power consumption benefits by avoiding the need for accesses to lower levels of the memory hierarchy in situations where data required by a caching node can be found within one of the local caches of another caching node, and hence accessed without needing to access those lower levels of the memory hierarchy.
In accordance with a typical cache coherency protocol, certain accesses performed by a caching node (or certain cache maintenance operations) will require a coherency operation to be performed. This is often determined by a centralised coherency manager. When it is determined that a coherency operation is required, the coherency manager will cause a snoop request to be sent to the other caching nodes (or at least an identified subset of the caching nodes) identifying the type of access taking place and the address being accessed. This will cause those other caching nodes to perform certain coherency actions defined by the cache coherency protocol, and typically results in certain information being fed back as snoop response data to the coherency manager. By such a technique, the coherency of the data held in the various local caches is maintained, ensuring that each caching node accesses up-to-date data. One such cache coherency protocol is the “Modified, Owned, Exclusive, Shared, Invalid” (MOESI) cache coherency protocol.
As the number of caching nodes increases within modern data processing systems, it is becoming ever more important to provide efficient mechanisms for performing the required snoop operations. Various types of interconnect structure have been considered for coupling the various caching nodes with the coherency manager so as to allow snoop requests to be efficiently passed to the required caching nodes, and to allow snoop responses to be returned to the coherency manager. Currently, research has been undertaken into the use of ring-based interconnect structures for providing coherency between multiple caching nodes. Examples of documents discussing the use of such ring-based interconnect structures include the article “Cache Coherence on a Slotted Ring” by L A Barroso et al, published in ICPP '91, the article “Coherence Ordering for Ring-based Chip Multiprocessors” by M Marty et al, published in the proceedings of the 39th Annual IEEE/ACM Symposium on Microarchitecture, 2006, and the article “Cache Coherent Architecture for Large Scale Multiprocessors” by P Mannava et al, published in the proceedings of the Fifth Workshop on Scalable Shared Memory Multiprocessors, International Symposium on Computer Architecture, 1995. The use of ring-based interconnect structures is also discussed in the “IEEE Standard for Scalable Coherent Interface (SCI)” published as IEEE Standard 1596-1992.
When using such ring-based interconnect structures, one issue that arises is the amount of traffic passing around the ring, which will include both snoop requests and snoop responses. It is known to use a single broadcast snoop request instead of multiple directed snoop requests in order to reduce the amount of snoop request traffic required. However, reducing the amount of snoop response traffic is more complex. The snoop response traffic is particularly problematic, since for each snoop request there will typically be multiple separate snoop responses from each of the caching nodes subjected to the snoop request. Outside of the area of ring-based interconnect structures, various schemes have been proposed in the literature where trees are embedded into the network topology to aggregate snoop responses on their way back to the snoop originator. However, such tree-based schemes do not lend themselves to use within a ring-based interconnect structure.
Accordingly, it would be desirable to provide a technique for reducing the amount of snoop response traffic within a ring-based interconnect.