Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches.
The minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or block, or page. The present patent document uses the term “line,” but the invention is equally applicable to systems employing blocks or pages. It is common to use some of the bits of the line address to index the cache, and the remaining set of bits of each physical address are stored, along with the data, as the tag.
Many computer systems employ multiple processors, each of which may have multiple levels of caches. Some caches may be shared by multiple processors. All processors and caches may share a common main memory. A particular line may simultaneously exist in memory and in the cache hierarchies for multiple processors. All copies of a line in the caches must be identical, a property called coherency. The protocols for maintaining coherence for multiple processors are called cache coherence protocols. If a processor changes the contents of a line, only the one changed copy is then valid, and all other copies in caches and memory must be then be updated or invalidated. If the copy of a line in memory is not identical to a copy in a cache, then the line in memory is referred to as a “stale” line, and the line in the cache is referred to as being “dirty”. The “most current copy” of a modified line is the most recently modified copy. If the line is cached but not modified, the most current copy is any copy, including the copy in memory. If the line is not cached, the most current copy of a line is in memory.
Cache coherence protocols commonly place each cached line into one of multiple states. One common approach uses three possible states for each line in a cache. Before any lines are placed into the cache, all entries are at a default state called “invalid”. When an uncached physical line is placed into the cache, the state of the entry in the cache is changed from invalid to “shared”. If a line is modified in a cache, it may also be immediately modified in memory (called write through). Alternatively, a cache may write a modified line to memory only when the modified line in the cache is invalidated or replaced (called write back). For a write-back cache, when a line in the cache is modified, the state of the entry in the cache is changed to “modified”. The three-state assignment just described is sometimes called a MSI protocol, referring to the first letter of each of the three states.
A common variation adds one additional state. In the variation, when a physical line is copied into the cache, if no copy of the line exists in any other cache, the line is placed in an “exclusive” state. The word “exclusive” means that exactly one cache hierarchy has a copy of the line. If a line is in an “exclusive” state in a cache hierarchy for a first processor, and if a second processor requests the same line, the line will then be copied into two cache hierarchies, and the state of the entry in each cache is set to “shared”. This four-state assignment just described is sometimes called a MESI protocol, referring to the first letter of each of the four states. There are many other variations.
A cache “owns” a line if the cache has permission to modify the line without issuing any further coherency transactions. If a cache owns a line, the line is potentially dirty (modifiable), and may be actually dirty (modified). There can only be one “owner” of a line. If the current owner of a line has modified the line, the most current copy of a line is always obtained from the current owner. If a line has not been modified, the most current copy of the line is any copy, including the copy in memory. For any cache coherence protocol, the most current copy of a cache line must be retrieved from the current owner, if any, and a copy of the data must be delivered to the requester. If the line is to be modified, ownership must be acquired by the requestor, and any shared copies must be invalidated.
There are three common approaches to determine the location of the owner of a line, with many variations and hybrids. In one approach, called a snooping protocol, or snoop-based protocol, the owner is unknown, and all caches must be interrogated (snooped) to determine the location of the most current copy of the requested line. All requests for access to a cache line, by any device in the system, are forwarded to all caches in the system. Eventually, the most current copy of a line is located and a copy is provided to the requester.
In a second approach, called a directory-based protocol, memory is provided to maintain information about the state of every line in the memory system. For example, for every line in memory, a directory may include a bit for each cache hierarchy to indicate whether that cache hierarchy has a copy of the line, and a bit to indicate whether that cache hierarchy has ownership. For every request for access to a cache line, the directory must be consulted to determine the owner, and then the most current copy of the line is retrieved and delivered to the requestor. Typically, tags and status bits for a directory are stored in main memory, so that a request for state information cycles main memory and has the latency of main memory.
A third approach is a global coherency filter, which has a tag for every line in the cache system. A coherency filter is a snoop system with a second set of tags, stored centrally, for all caches in the system. A request for a cache line is forwarded to the central filter, rather than to all the caches. The tags for a coherency filter are typically stored in a small high-speed memory.
For relatively small systems, with one bus or with only a few buses, snoop-based protocols provide the best performance. However, snoop-based systems increase bus traffic, and for large systems, snoop traffic can limit overall performance. Directory-based systems increase the time required to retrieve a line (latency), but require less bus traffic than snoop-based systems. For large multiple bus systems, where bus traffic may be more important than latency, directory-based systems typically provide the best overall performance. Many computer systems use some sort of hybrid of snoop-based and directory-based protocols. For example, for a multiple bus system, snoop-based protocols may be used for coherency on each local bus, and directory-based protocols may be used for coherency across buses.
There is an ongoing need for improved cache coherence protocols, particularly for large multiple bus multiprocessor systems.