Cache coherency is used to maintain the consistency of data in a distributed shared memory system. A number of agents, each usually comprising one or more caches, are connected through a fabric or a central cache coherency controller. This allows the agents to take advantage of the performance benefit of caches while still providing, among various agents, a consistent view of data within a shared physical address space.
Cache coherency protocols are usually based on acquiring and relinquishing permissions on sets of data, typically called cache lines containing a fixed amount of data (e.g. 32 or 64 bytes). Typical permissions are:                None: the cache line is not in the agent and the agent has no permission to read or write the data.        Readable: the cache line is in the agent and the agent has permission to read the cache line content stored locally. Multiple agents can simultaneously have read permission on a cache line (i.e. multiple readers).        Readable and writable: the cache line is in the agent and the agent has permission to write (and typically read) the cache line content. Only one agent can have write permission on a cache line, and no other agent can have read permission at the same time.        
There is usually a backing store for all cache lines (e.g. a DRAM). The backing store is the location where the data is stored when it is not in any of the caches. Data is constantly being updated or changed in the system. Thus, at any point in time, the data in the backing store may not be up to date with respect to the latest copy of a cache line, which may be in an agent. Because of this, cache lines inside agents often includes an indication of whether the cache line is clean (i.e. it has the same value as in the backing store) or dirty (i.e. it needs to be written back to the backing store eventually since it is the most up-to-date version).
The permission and “dirtiness” of a cache line in an agent is referred to as the “state” of the cache line. The most common set of coherency states is called MESI (Modified-Exclusive-Shared-Invalid), where Shared corresponds to the read permission (and the cache line being clean) and both Modified and Exclusive give read/write permissions, but in the Exclusive state, the line is clean, while in the Modified state, the line is dirty and must be eventually written back. In that state set, shared cache lines are always clean. There are more complex versions like MOESI (Modified-Owned-Exclusive-Shared-Invalid) where cache lines with read permission are allowed to be dirty. Other protocols may have separate read and write permissions. Many cache coherency state sets and protocols exist.
In the general case, when an agent needs a permission on a cache line that it does not have, it must interact with other agents directly or through a cache coherency controller to acquire the permission. In the simplest “snoop-based” protocols, the other agents must be “snooped” to make sure that the permission requested by the agent is consistent with the permissions already owned by the other agents. For instance, if an agent requests read permission and no other agent has write permission, the read permission can be granted. However, if an agent already has write permission, that permission must be removed from that agent first before it is granted to the original requester.
In some systems, the agent directly places snoop requests (also known as snoops) on a bus and all agents (or at least all other agents) respond to the snoop requests. In other systems, the agent places a permission request to a coherency controller, which in turn will snoop the other agents (and possibly the requesting agent itself).
In directory-based protocols, directories of permissions acquired by agents are maintained and snoops are sent only when permissions need to change in an agent. Snoop filters may also be used to reduce the number of snoops sent to agents. Snoop filters keep track of the content of the agents and do not send a snoop to an agent if it knows that the agent does not need to change its permissions.
Data and permissions interact in cache coherency protocols, but the way they interact varies. Agents usually place requests for both permission and data simultaneously, though not necessarily. For instance, in one case an agent that wants to place data in its cache for reading purposes and has neither the data nor the permission can place a read request including both the request for permission and for the data itself. However, in another case an agent that already has the data and read permission but needs write permission may place an “upgrade” request to write permission, but does not need data.
Likewise, responses to snoop requests can include an acknowledgement that the permission change has happened, but can also optionally contain data. The snooped agent may be sending the data as a courtesy. Alternatively, the snooped agent may be sending dirty data that has to be kept to be eventually written back to the backing store.
Agents can hold permission without data. For instance, an agent that wants to write a full cache line may not request data with the write permission, as it knows it will not use it (the agent will overwrite the data completely). In some systems, holding partial data is permitted (in quanta of sectors, bytes, or other units). This is useful to limit data transfers, though it makes the cache coherency protocol more complex.
Many cache coherency protocols provide two related ways for data to leave an agent. One is through the snoop response path, providing data as a response to a snoop. The other is a spontaneous write path (often called write back or evict path) where the agent can send the data out when it does not want to keep it anymore. In some protocols, the snoop response and write back paths are shared.
Fully coherent agents are capable of both owning permissions for cache lines and receiving snoop requests to check and possibly change their permissions, triggered by a request from another agent. A common type of fully coherent agent is a microprocessor with a coherent cache. As the microprocessor needs to do reads and writes, it acquires the appropriate permissions, and potentially data, and puts them in its cache. Many modern microprocessors have multiple levels of caches inside. Many modern microprocessors contain multiple microprocessor cores, each with its own cache, and often a shared second-level cache. Other types of agents may be fully coherent such as DSPs, GPUs and various types of multimedia agents comprising a cache.
In contrast, I/O coherent (also called one-way coherent) agents do not use a coherent cache, but they need to operate on a consistent copy of the data with respect to the fully coherent agents. As a consequence, their read and write request may trigger coherency actions (snoops) to fully coherent agents. In most cases, this is done by having either a special bridge or the central coherency controller issue the appropriate coherency action and sequence the actual reads or writes to the backing store if necessary. In the case of a small bridge, that bridge may act as a fully coherent agent holding permissions for a small amount of time. In the case of the central coherency controller, it tracks the reads and writes, and prevents other agents from accessing cache lines that are being processed on behalf of the I/O coherent agent.
Referring now to FIG. 1, cache coherent system 100 includes central coherency controller 102 where the requests from I/O agent 106 and coherent agents 108 and 110 trigger coherency resolution logic 104 to send snoops to coherent agents 108 and 110. In general, fully coherent agents use the full extent of the cache coherency protocol and interactions between fully coherent agents can be extremely complex. On the other hand, interactions triggered by I/O coherent agents, such as agent 106, are simpler as the number of combinations of requests and cache line states is limited, and there is no interaction the other way as I/O coherent agents are not snooped.
As a consequence, a system with one fully coherent agent and one or more I/O agent is fairly simple, while a system, such as the system 100, with two or more fully coherent agents is much more complex. A more complex system has a higher risk of bugs, is larger, requires more area, and has longer latency to respond to requests.
Referring now to FIG. 2, a process 200 starts at step 202 with a coherent requestor sending a coherent request to a coherency controller. At step 204, the coherency controller receives the coherency request and determines if data is part of the coherency request. The process 200 continues to step 206 if there is a data needed as part of the coherency request. At step 208, the coherency controller sends snoops to all coherent agents. At step 210, the snooped coherent agents send responses and data, if the data is valid. At step 212, the coherency controller collects the snoop responses to determine which response had data. If the snoop response(s) had no data, then the process moves to step 214. At step 216 the coherency controller sends a read to memory and at step 218 the memory provides the requested data to the coherent requestor and the coherent requestor completes the transaction at step 224. On the other hand, if at step 212, it is determined that a snoop response includes data at step 220, then at step 222 the coherency controller forwards the snoop data to the coherency requestor and the coherent requestor completes the transaction at step 224.
If at step 204 it is determined that the request does not need data, then the process 200 continues to step 236. At step 238, the coherency controller sends snoops to all coherent agents. At step 240, the snooped coherent agents send responses and data, if the data is dirty. At step 242, the coherency controller collects the snoop responses to determine which response had dirty data. If a snoop response had dirty data, then the process moves to step 244. At step 246 the coherency controller writes data to memory. At step 248 the coherency controller transmits a coherent response with no data to the coherency requestor. At step 250, the coherent requestor receives the response and completes the transaction. If at step 242 it is determines that the snoop response has no data, then the process 200 continues to step 252 and then to step 248 as noted above.
Therefore, what is needed is a simpler coherency controller that supports two or more fully-coherent agents and one or more I/O coherent agent.