1. Field of the Invention
This invention is related to processors and coherent systems including processors.
2. Description of the Related Art
Computer systems have generally implemented one or more levels of cache to reduce memory latency. The caches are smaller, higher speed memories than the memory in the main memory system. Typically, caches store recently-used data. For example, caches are often implemented for processor access, and store data recently read/written by the processors in the computer systems. Caches are sometimes implemented for other high speed devices in the computer system as well. In addition to storing recently-used data, caches can be used to store prefetched data that is expected to be used by the processor (or other device).
Caches store copies of data that is also stored in main memory. In multiprocessor systems, and even in single processor systems in which other devices access main memory but do not access a given cache, the issue of cache coherence arises. That is, a given data producer can write a copy of data in the cache, and the update to main memory's copy is delayed. In write-through caches, a write operation is dispatched to memory in response to the write to the cache line, but the write is delayed in time. In a writeback cache, writes are made in the cache and not reflected in memory until the updated cache block is replaced in the cache (and is written back to main memory in response to the replacement).
Because the updates have not been made to main memory at the time the updates are made in cache, a given data consumer can read the copy of data in main memory and obtain “stale” data (data that has not yet been updated). A cached copy in a cache other than the one to which a data producer is coupled can also have stale data. Additionally, if multiple data producers are writing the same memory locations, different data consumers could observe the writes in different orders.
Cache coherence solves these problems by ensuring that various copies of the same data (from the same memory location) can be maintained while avoiding “stale data”, and by establishing a “global” order of reads/writes to the memory locations by different producers/consumers. If a read follows a write in the global order, the data read reflects the write.
Cache coherence schemes create an overhead on memory read/write operations. Typically, caches will track a state of their copies according to the coherence scheme. For example, the popular Modified, Exclusive, Shared, Invalid (MESI) scheme includes a modified state (the copy is modified with respect to main memory and other copies); an exclusive state (the copy is the only copy other than main memory); a shared state (there may be one or more other copies besides the main memory copy); and the invalid state (the copy is not valid). The MOESI scheme adds an Owned state in which the cache is responsible for providing the data for a request (either by writing back to main memory before the data is provided to the requestor, or by directly providing the data to the requester), but there may be other copies in other caches. Thus, the overhead of the cache coherence scheme includes communications among the caches to maintain/update the coherence state. These communications can increase the latency of the memory read/write operations.
The overhead is dependent on the structure of the computer system. More specifically, the overhead depends on the form of interconnect between the various caches and data producers/consumers. In a shared bus system, snooping is often implemented to maintain coherence. A given memory request transmitted on the bus is captured by other caches, which check if a copy of the requested data is stored in the cache. The caches can update the state of their copies (and provide data, if the cache has the most up to date copy). Generally, in a snooping system, the snoopers provide a response in the response phase of the transaction. A source for the data cache can be determined from the response (e.g. the main memory system or a cache with a more coherent copy). Because the snoop response is used to determine the source of the data for a memory transaction, the data transfer is delayed to the snoop response, and thus memory latency can be increased in cases in which the data could otherwise be provided prior to the snoop response (e.g. due to a cache hit).