1. Technical Field
The present invention relates in general to data processing systems, and more particularly to an improved multi-processor data processing system. Still more particularly, the present invention relates to improved management of a hierarchical cache system within a multi-processor data processing system.
2. Description of the Related Art
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Because multiple processor cores may request write access to a same cache line of data and because modified cache lines are not immediately synchronized with system memory, the cache hierarchies of multiprocessor computer systems typically implement a cache coherency protocol to ensure at least a minimum level of coherence among the various processor core's “views” of the contents of system memory. In particular, cache coherency requires, at a minimum, that after a processing unit accesses a copy of a memory block and subsequently accesses an updated copy of the memory block, the processing unit cannot again access the old copy of the memory block.
A cache coherency protocol typically defines a set of cache states stored in association with the cache lines stored at each level of the cache hierarchy, as well as a set of coherency messages utilized to communicate the cache state information between cache hierarchies. In a typical implementation, the cache state information takes the form of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol or a variant thereof, and the coherency messages indicate a protocol-defined coherency state transition in the cache hierarchy of the requester and/or the recipients of a memory access request. The MESI protocol allows a cache line of data to be tagged with one of four states: “M” (modified), “E” (exclusive), “S” (shared), or “I” (invalid). The Modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to system memory. When a coherency granule is indicated as Exclusive, then, of all caches at that level of the memory hierarchy, only that cache holds the coherency. The data in the Exclusive state is consistent with system memory, however. If a coherency granule is marked as Shared in a cache directory, the coherency granule is resident in the associated cache and in at least one other cache at the same level of the memory hierarchy, and all of the copies of the coherency granule are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
The state to which each coherency granule (e.g., cache line or sector) is set is dependent upon both a previous state of the data within the cache line and the type of memory access request received from a requesting device (e.g., the processor). Accordingly, maintaining memory coherency in the system requires that the processors communicate messages across the system bus indicating their intention to read or write to memory locations. For example, when a processor desires to write data to a memory location, the processor must first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been invalidated, thereby guaranteeing that the other processors will not access their stale local data.
In some systems, the cache hierarchy includes at least two levels. The level one (L1), or an upper-level cache is usually a private cache associated with a particular processor core in an MP system. The processor core first looks for a data in the upper-level (L1) cache. If the requested data is not found in the upper-level cache, the processor core then access lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested data. The lowest level cache (e.g., L3) is often shared among several processor cores (L2 cache being an upper-level cache relative to L3 cache).
Typically, when a congruence class of one of an upper-level cache becomes full, data lines are “evicted” or written to a lower-level cache or out to system memory for storage. However, in any memory hierarchy, there may be several copies of the same data residing in the memory hierarchy at the same time. The policy of evicting lines to provide for more space in the upper-level cache results in writes to lower-level caches, including updating coherency state information in the lower-level cache directory.
Heretofore, cache coherency protocols have generally assumed that to maintain cache coherency, coherency states from upper-level cache is copied into lower-level cache. The present invention recognizes significant performance enhancements to the data processing system can be achieved by intelligently defining the protocols for coherency state transition in the cache hierarchy.