1. Field of the Invention
The present invention relates to techniques for improving the performance of computer systems. More specifically, the present invention relates to a method and an apparatus for maintaining the coherence between cache lines in a computer system by using dynamic privatization.
2. Related Art
Computer systems often use a coherence protocol to ensure that copies of cache lines remain consistent. For example, one such coherence protocol is the “MESI” protocol. MESI is an acronym that represents the states in which a cache line can be held in a processor and/or in memory. The MESI states include “modified” (M), “exclusive” (E), “shared” (S), and “invalid” (I). (Note that although we describe systems which use the MESI protocol, there exist other coherence protocols that include some or all of the MESI states and which operate in a similar manner.)
A processor that contains a copy of a given cache line in the M state holds a current, valid copy of the cache line. For such a cache line, the copy of the cache line in memory is stale and no other processor holds a copy. Moreover, a processor that holds a cache line in the M state has both read and write permission for the cache line, so the processor can freely read from and write to the cache line.
A processor that contains a copy of a cache line in the E state holds a current, valid copy of the cache line. For such a cache line, the copy in memory is also a current, valid copy of the cache line. However, no other processor holds a copy of the cache line (i.e., the cache line is “privately” held). A processor that holds a cache line in the E state has read-only permission for the cache line, so the processor can freely read from the cache line, but cannot write to the cache line. In addition, a cache line in the E state can typically be silently evicted from the processor without requesting permission.
A processor that contains a copy of a cache line in the S state holds the current, valid copy of the cache line. The copy in memory is also a current, valid copy of the cache line. Additionally, one or more other processors in the system may also hold copies of the cache line in the S state. Note that a processor that holds a cache line in the S state has read-only permission for the cache line, so the processor can freely read from the cache line, but cannot write to the cache line. Moreover, a cache line in the S state can typically be silently evicted from the processor without requesting permission.
A processor that contains a copy of a cache line in the I state does not contain a valid copy of the cache line. However, valid copies of the cache line may exist in memory or in another processor. Moreover, a processor that holds a cache line in the invalid state has no read or write permission for the cache line, so the processor cannot read from or write to the cache line.
In a directory-based computer system that uses the MESI protocol, when a cache line in the E state is to be written, the cache line can be upgraded locally from the E state to the M state without making a request to the directory (i.e., the cache line can be “silently” upgraded). This is beneficial because it eliminates the latency and bandwidth required to request the upgrade from the directory.
However, supporting the E state in such a system can adversely affect performance when a processor R requests a line that is held in the E state by a different processor S, because a request must be sent from the directory to processor S, and processor S must provide the line to the directory (or directly to processor R) from its cache. Furthermore, in protocols that reduce directory bandwidth by not updating the directory during the eviction of a line in E state, it is possible that processor S will deny (i.e., not-acknowledge or “NACK”) the request to forward the line to processor R (because processor S has evicted the line) which results in “four-hop” latency (i.e., a “four-hop miss”) for processor R's request. Moreover, in a computer system that includes a large cache, maintaining cache lines that are accessed by multiple processors in the E state can be inefficient because the coherence protocol overhead involved in accessing cache lines is particularly high.