1. Field of the Invention
The present invention relates to the handling of bus operations in an L2 cache.
2. Background of the Related Art
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Because multiple processor cores may request write access to a same cache line of data and because modified cache lines are not immediately synchronized with system memory, the cache hierarchies of multiprocessor computer systems typically implement a cache coherency protocol to ensure at least a minimum level of coherence among the various processor core's “views” of the contents of system memory. In particular, cache coherency requires, at a minimum, that after a processing unit accesses a copy of a memory block and subsequently accesses an updated copy of the memory block, the processing unit cannot again access the old copy of the memory block.
A cache coherency protocol typically defines a set of cache states stored in association with the cache lines stored at each level of the cache hierarchy, as well as a set of coherency messages utilized to communicate the cache state information between cache hierarchies. In a typical implementation, the cache state information takes the form of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol, MOESI (Modified, Owned, Exclusive, Shared, Invalid) protocol, or a variant thereof, and the coherency messages indicate a protocol-defined coherency state transition in the cache hierarchy of the requestor and/or the recipients of a memory access request. The MOESI protocol allows a cache line of data to be tagged with one of five states: “M” (Modified), “O” (Owned), “E” (Exclusive), “S” (Shared), or “I” (Invalid). The Modified state indicates that a memory block is valid only in the cache holding the Modified memory block and that the memory block is not consistent with system memory. Only one cache can hold a memory block in the Owned state, although other caches may hold the same memory block in the Shared state. When a coherency state is indicated as Exclusive, then, of all caches at that level of the memory hierarchy, only that cache holds the memory block. The data of the Exclusive memory block is consistent with that of the corresponding location in system memory, however. If a memory block is marked as Shared in a cache directory, the memory block is resident in the associated cache and either is or was at some point in time in at least one other cache at the same level of the memory hierarchy, and all of the copies of the coherency state are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency state are both invalid.
The state to which each memory block (e.g., cache line or sector) is set is dependent upon both a previous state of the data within the cache line and the type of memory access request received from a requesting device (e.g., the processor). Accordingly, maintaining memory coherency in the system requires that the processors communicate messages via the system interconnect indicating their intention to read or write memory locations. For example, when a processor desires to write data to a memory location, the processor may first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been invalidated, thereby guaranteeing that the other processors will not access their stale local data.
In some systems, the cache hierarchy includes multiple levels, with each lower level generally having successively longer access latency. Thus, a level one (L1) cache generally has lower access latency than a level two (L2) cache, which in turn has lower access latency than a level three (L3) cache. The level one (L1) or upper-level cache is usually a private cache associated with a particular processor core in a multiprocessor system. Because of the low access latencies of L1 caches, a processor core first attempts to service memory access requests in its L1 cache. If the requested data is not present in the L1 cache or is not associated with a coherency state permitting the memory access request to be serviced without further communication, the processor core then transmits the memory access request to one or more lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested data.
An L2 cache typically has a number of processor side handling machines to handle the demand operations (e.g., load, store, and fetch) that arrive from the processor(s) and thread(s). The processor side handling machines are often responsible for doing such things as searching the L2 cache data array, returning data/instructions for the sought after address, updating the L2 cache data array, and requesting data from memory or from the L3 cache if the sought after address does not exist in the L2 cache.
To implement a cache coherency protocol, each cache may broadcast a desired command or request onto the bus, and have each cache “snoop” and respond to the command based on it's line state. For example, consistent with the MOESI protocol, if Cache A is working on a Load and it does not have a copy of the desired cache line, then Cache A broadcasts a read request on the bus. Other caches and the memory controller receive the command, search their cache data array, and provide a response. If Cache B has the cache line in “Modified” state, then Cache B may provide a copy of the cache line with Cache A in a “Shared” state, while Cache B transitions its cache line to an “Owned” state.