1. Technical Field
The present invention relates in general to data processing systems, and more particularly to an improved multi-processor data processing system. Still more particularly, the present invention relates to improved coherency management of a hierarchical cache system within a multi-processor data processing system.
2. Description of the Related Art
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Because multiple processor cores may request write access to a same cache line of data and because modified cache lines are not immediately synchronized with system memory, the cache hierarchies of multiprocessor computer systems typically implement a cache coherency protocol to ensure at least a minimum level of coherence among the various processor core's “views” of the contents of system memory. In particular, cache coherency requires, at a minimum, that after a processing unit accesses a copy of a memory block and subsequently accesses an updated copy of the memory block, the processing unit cannot again access the old copy of the memory block.
A cache coherency protocol typically defines a set of cache states stored in association with the cache lines stored at each level of the cache hierarchy, as well as a set of coherency messages utilized to communicate the cache state information between cache hierarchies. In a typical implementation, the cache state information takes the form of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol or a variant thereof, and the coherency messages indicate a protocol-defined coherency state transition in the cache hierarchy of the requestor and/or the recipients of a memory access request. The MESI protocol allows a cache line of data to be tagged with one of four states: “M” (Modified), “E” (Exclusive), “S” (Shared), or “I” (Invalid). The Modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to system memory. When a coherency granule is indicated as Exclusive, then, of all caches in the memory hierarchy, only that cache holds the coherency granule. The data in the Exclusive state is consistent with system memory, however. If a coherency granule is marked as Shared in a cache directory, the coherency granule is resident in the associated cache and in possibly one or more other caches in the memory hierarchy, and all of the copies of the coherency granule are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
The state to which each coherency granule (e.g., cache line) is set is dependent upon both a previous state of the data within the cache line and the type of memory access request received from a requesting device (e.g., the processor). Accordingly, maintaining memory coherency in the system requires that the processors communicate messages across the system bus indicating their intention to read or write to memory locations. For example, when a processor desires to write data to a memory location, the processor must first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been or will be invalidated, thereby guaranteeing that the other processors will not incorrectly access their stale local data.
In some systems, the cache hierarchy includes at least two levels, a level one (L1) or upper level caches and one or more levels of lower level caches, such as level two (L2) caches and level three (L3) caches (the L2 caches being upper level caches relative to the L3 caches). An L1 cache is usually a private cache associated with a particular processor core in an MP system. The processor core first attempts to access data in its L1 cache. If the requested data is not found in the L1 cache, the processor core then access one or more lower level caches (e.g., level two (L2) or level three (L3) caches) for the requested data. The lowest level cache (e.g., L3) is often shared among several processor cores.
Typically, when a congruence class of an upper level cache becomes fill, data lines are “evicted” or written to a lower level cache or out to system memory for storage. However, in any memory hierarchy, there may be several copies of the same data residing in the memory hierarchy at the same time. The policy of evicting lines to provide for more space in the upper level cache results in updates to lower level caches, including updates of coherency state information in the lower level cache directory.
Heretofore, cache coherency protocols have generally assumed that to maintain cache coherency, coherency states from upper level cache are copied into lower level cache upon eviction of a cache line from an upper level cache. The present invention recognizes performance enhancements to the data processing system can be achieved by intelligently defining the coherency states and coherency state transitions in the cache hierarchy when castouts are performed and for other data processing scenarios.