Cache coherency is used to maintain the consistency of data in a shared memory system. A number of initiators, at least one comprising one or more caches, are connected together through a fabric or a central cache coherency controller. This allows the initiators to take advantage of the performance benefit of caches while still providing a consistent view of data across initiators.
Cache coherency protocols are usually based on acquiring and releasing permissions on sets of data (e.g. 32 or 64 bytes), typically called cache lines. Typical permissions are:                None: the cache line is not in the initiator and the initiator has no permission to read or write the data.        Readable: the cache line is in the initiator and the initiator has permission to read the cache line content stored locally. Multiple initiators can simultaneously have read permission on a cache line (i.e. multiple readers).        Readable and writable: the cache line is in the initiator and the initiator has permission to write (and typically read) the cache line content. Only one initiator can have write permission on a cache line, and no initiator can have read permission at the same time.        
There is usually a backing store for all cache lines (e.g. a DRAM), which is a target of the fabric or coherency controller. The backing store is the location where the data is stored when it is not in any of the caches. At any point in time, the data in the backing store may not be up to date with respect of the latest copy of a cache line, which may be in an initiator. Because of this, cache lines inside initiators often include an indication of whether the cache line is clean (i.e. it has the same value as in the backing store) or dirty (i.e. it needs to be written back to the backing store at some point as it is the most up-to-date version).
The permission and “dirtiness” of a cache line in an initiator is referred to as the “state” of the cache line. The most common set of coherency states is called MESI (Modified-Exclusive-Shared-Invalid), where Shared corresponds to the read permission (and the cache line being clean) and both Modified and Exclusive give read/write permissions, but in the Exclusive state, the cache line is clean, while in the Modified state, the cache line is dirty and must be eventually written back. In that state set, shared cache lines are always clean.
There are more complex sets of coherency states like MOESI (Modified-Owned-Exclusive-Shared-Invalid) where cache lines with read permission are allowed to be dirty.
Other protocols may have separate read and write permissions. Many cache coherency state sets and protocols exist.
In the general case, when an initiator needs a permission on a cache line that it does not have, it must interact with other initiators directly or through a cache coherency controller to acquire the permission. In the simplest “snoop-based” protocols, the other initiators must be “snooped” to make sure that the permission requested by the initiator is consistent with the permissions already owned by the other initiators. For instance, if a first initiator requests read permission and no other initiator has write permission, the read permission can be granted. However, if a second initiator already has write permission, that permission must be removed from that initiator first before it is granted to the original requester.
In some systems, initiators directly issue snoop requests on a bus and all initiators (or at least all other initiators) respond to the snoop requests. In other systems, the initiators issue permission requests to a coherency controller, which in turn snoops the other initiators (and possibly the requesting initiator itself).
In “directory-based” protocols, directories of permissions acquired by initiators are maintained and snoop requests are sent only when permissions need to change in an initiator.
Snoop filters may also be used to reduce the number of snoop requests sent to initiators. A snoop filter keeps a coarse view of the content of the initiators and don't send a snoop request to an initiator if the snoop filter knows that the initiator does not need to change its permissions.
Data and permissions interact in cache coherency protocols, but the way they interact varies. Initiators usually issue requests for both permission and data simultaneously, but not always. For instance, an initiator that wants to place data in its cache for reading purposes and has neither the data nor the permission can issue a read request including both the request for permission and for the data itself. However, an initiator that already has the data and read permission but needs write permission may issue an “upgrade” request to write permission, but does not need data.
Likewise, responses to snoop requests can include acknowledgments that the permission change has happened, but can also optionally contain data. The snooped initiator may be sending the data as a courtesy. Alternatively, the snooped initiator may be sending dirty data that has to be kept to be eventually written back to the backing store.
Initiators with caches can hold permission without data. For instance, an initiator that wants to write a full cache line may decide not to request data with the write permission, as it knows it will not use it (it will override it completely). In some systems, holding partial data is permitted (in sectors, per byte . . . ). This is useful to limit data transfers but it makes the cache coherency protocol more complex.
Many cache coherency protocols provide two related ways for data to leave an initiator. One is through the snoop response channel, providing data as a response to a snoop. The other is a spontaneous write channel (often called write back or evict channel) where the initiator can send the data out when it does not want to keep it anymore. In some protocols, the snoop response channel and write back channel are shared.
Fully coherent initiators are capable of both owning permissions for cache lines and receiving snoop requests to check and possibly change their permissions, as triggered by requests from another initiator. A common type of fully coherent initiator is a microprocessor with a coherent cache. As the microprocessor needs to do reads and writes, it acquires the appropriate permissions and potentially data and puts them in its cache. Many modern microprocessors have multiple levels of caches inside. Many modern microprocessors contain multiple microprocessor cores, each with a cache and often a shared second-level cache. Many other types of initiators may be fully coherent such as DSPs, GPUs and various types of multimedia initiators comprising a cache.
In contrast, I/O coherent (also called one-way coherent) initiators do not use a coherent cache, but they need to operate on a consistent copy of the data with respect to the fully coherent initiators. As a consequence, their read and write request may trigger snoops to fully coherent initiators. In most cases, this is done by having either a special bridge or the central coherency controller issue the appropriate coherency action and sequence the actual reads or writes to the backing store if necessary. In the case of a small bridge, that bridge may act as a fully coherent initiator holding permissions for a small amount of time. In the case of the central coherency controller, it tracks the reads and writes, and prevents other initiator from accessing cache lines that are being processed on behalf of the I/O coherent initiator.
In cache coherency systems, when data is requested by an initiator, it can be provided either by one of the other initiators (if they have a cache containing the data) or by the backing store target (or another cache on the way to the backing store). It is normally advantageous to try to obtain data from another initiator as this reduces the throughput and therefore power needed from the backing store, which is often a bottleneck in the system (e.g. when the backing store is external DRAM). Because of this, cache coherency systems are often tuned to obtain data from other initiators as much as possible.
However, in some cases, the other initiators may not have enough throughput to provide data on a continuous basis. Usually, the data request is done in conjunction with a snoop request or implicit within a snoop request. While initiators can typically handle a large amount of snoop requests when no data needs to be transferred (for instance, they can handle one snoop request per cycle when they do not have the snooped data), they can handle many fewer snoop requests when they have to provide data (for instance, 1 snoop every four cycles). This is particularly a problem when the required snoop data throughput is high or the initiators with coherent caches are running at a lower-than-peak clock frequency.
Therefore, what is needed is a coherency system where initiators with coherent caches may be asked to provide data when they have enough throughput to do so, and asked not to provide data, if allowed by the protocol, when they do not have enough available throughput to do so.