The present invention relates to an improved cache coherency scheme in a multi-processor system.
FIG. 1 illustrates a typical multi-processor system having a plurality of agents 10-50. The plurality of agents 10-50 are in communication with each other over a common external bus 60. An xe2x80x9cagentxe2x80x9d may be anything that communicates over the external bus, including microprocessors, input/output devices, memory systems and special-purpose chipsets or digital signal processors. The agents 10-50 communicate over the external bus 60 using a pre-defined protocol. Typically, one of the agents, such as 50, is a memory storing data. During operation, other agents 10-40 may share the same data. Cache coherency systems ensure that each agent operates on the most current copy of data available.
One such cache coherency system is the MESI (pronounced xe2x80x9cmessyxe2x80x9d) system. The MESI system defines four states for data. One of the four states is applied to each copy of data stored in an agent""s internal cache(s). The MESI states are:
Invalidxe2x80x94Although the agent may have cached a copy of data, the copy is unavailable to the agent. When the agent requires the data, the agent must fetch the data from external memory 50 or from another cache.
Sharedxe2x80x94A cached copy is valid and possesses the same value as is stored in external memory 50. The agent may only read the data. Copies of the data may be stored in the caches of other agents. An agent 10 may not modify data having a shared state without first performing an external bus transaction to ensure that the agent has exclusive control over the copy of data.
Exclusivexe2x80x94The cached copy is valid and may possess the same value as is stored in external memory 50. When an agent 10 caches data having an exclusive state, it may read and write (modify) the data without an external cache coherency check. The data must be invalid in all other agents. The agent that stores data in an exclusive state is guaranteed to have the most up-to-date copy within the system somewhere in its cache hierarchy.
Modifiedxe2x80x94The cached copy is valid and is dirty. It may be more current than the copy stored in external memory 50. When an agent 10 caches data having a modified state, it may read and write (modify) the data without an external cache coherency check. The data must be invalid in all other agents. The agent that stores data in a modified state is guaranteed to have the most up-to-date copy within the system somewhere in its cache hierarchy.
Only one state may attach to a given copy of data. For example, a single copy of data may not be both modified and shared. However, as described below, a single agent may store copies of data in multiple caches. Some copies may be assigned a different state than other copies in a single agent. Further explanation of MESI principles may be found in the Pentium Pro-Family Developers Manual, Volume 1: Specifications, ISBN 1-55512-259-0 (1996).
FIG. 2 illustrates a multiple cache system that may be used in an agent 10. High performance agents often include multiple caches arranged in layers to reduce the impact of memory latency and bandwidth. A lowest, layer cache (xe2x80x9cLØxe2x80x9d) typically has a small capacity but is designed to be very fast. One or more higher layer caches L1, L2 typically are larger than the LØ cache but are accessed at a lower frequency. The higher layer caches L1, L2, however, still operate at a much higher frequency than external memory 50. Copies of a single piece of data may be stored in multiple caches. State information is stored with each copy. Further, the state of data may be different in different layers. For example, data may be read into an L1 cache as exclusive data and, later, be read to the LØ cache and modified. The L1 copy remains in an exclusive state even though the LØ copy is in a modified state.
In a multi-layer cache system, the MESI system can cause cache coherency problems within an agent 10. The goal of cache coherency systems is to provide the most current copy of data to any agent that will use the data. Certain data eviction policies can cause an agent to obtain access to stale data.
The state of a copy of data determines how it is evicted from a cache. When a cache is full, new data may not be stored in the cache until old data is xe2x80x9cevictedxe2x80x9d from the cache. For old data stored in an invalid, exclusive or shared state, eviction occurs simply by writing the new data over the old data. This copy of old data is lost, but it is guaranteed that the same, or possibly more up-to-date copy of data still exits somewhere within the system. For modified data, however, data eviction requires that the modified data be output to a higher layer cache or to a memory before it can be overwritten in the first cache. A copy of data in a modified state could be the only current copy of data in the system. If it were overwritten, the most up-to-date copy of the data could be lost to the system.
Under the MESI system, bus transactions from other agents are interpreted by a first agent 10 in one of two ways: The request may be interpreted as a xe2x80x9cGo-to-Invalidxe2x80x9d snoop which causes the agent 10 to mark all cached copies of the requested data as invalid. Alternatively, the request may be interpreted as a xe2x80x9cGo-to-Sharedxe2x80x9d snoop which causes the agent 10 to mark cached copies of valid data as shared. In either case, when a snoop implicates modified data, the modified data is provided by the first agent 10 to the second agent 20 via an implicit writeback. The MESI state changes resulting from a Go-to-Shared snoop are shown in FIG. 3 in a MESI system. The modified-to-shared transition may cause cache coherency to be broken within the agent.
An agent may obtain access to stale data as shown in the following example. An agent 10 reads in data, such as a counter, and modifies it. According to such a process, the data""s initial value may be stored in the L1 cache in an exclusive state and the data""s modified value may be stored in the LØ. The data is snooped by another agent 20 as part of a read request. The snoop is interpreted as a xe2x80x9cGo-to-Sharedxe2x80x9d snoop in which the agent 10 marks all matching copies of the data as shared. Also, the current value of the modified copy is written back to the snooping agent 20 and memory 50 by an implicit writeback. Once the modified data is marked as shared, it is subject to the data eviction policies of ordinary shared data.
Later, the copy in the LØ cache may be overwritten. However, the stale data in the L1 cache remains. If the agent 10 were to require the data, the agent would obtain and use the stale data from the L1 cache. This causes coherency problems.
Earlier processors have addressed this cache coherency issue. The Pentium(copyright) Pro processor, commercially available from Intel Corporation, solved this cache coherency issue by marking as invalid all copies of requested data except one. First, it identified all copies of data that matched the requested data and marked all of them as shared. Second, if modified data were present, then it would go back and mark all stale copies as invalid. This two-step snoop state update also caused problems because it was not atomic. By marking stale data first as shared then as invalid, a small window of time existed when the processor core possibly could gain access to the stale data.
Accordingly, there is a need in the art for a cache coherency scheme in a multi-cache agent that prevents the agent from gaining access to stale data that may be stored in one or more of the agent""s caches. Further, there is a need in the art for such a scheme that permits snoop state update; to be atomic.
Embodiments of the present invention provide a cache coherency scheme in which a copy of data may take one of five states including an invalid state, an exclusive state, a shared state, a modified state and a lazy update state. In the lazy update state, a copy of the data is protected against eviction but is considered shared with other agents.