The present invention relates generally to atomic operations, and particularly to preventing eviction of cache data while it is subject to an atomic operation.
FIG. 1 illustrates, in block diagram form, a typical prior art multi-processor System 30. System 30 includes a number of Processors, 32a, 32b, 32c, coupled via a shared Bus 35 to Main Memory 36. Each Processor 32 has its own non-blocking Cache 34, which is N-way set associative. Each cache index includes data and a tag to identify the memory address with which the data is associated. Additionally, coherency bits are associated with each item of data in the cache to indicate the cache coherency state of the data entry. According to the MOSI cache coherency protocol, each cache data entry can be in one of four states: M, O, S, or I. The I state indicates invalid data. The owned state, O, indicates that the data associated with a cache index is valid, has been modified from the version in memory, is owned by a particular cache and that another cache may have a shared copy of the data. The processor with a requested line in the O state responds with data upon request from other processors. The shared state, S, indicates that the data associated with a cache index is valid, and one or more other processors share a copy of the data. The modified state, M, indicates valid data that has been modified since it was read into cache and that no other processor has a copy of the data.
Cache coherency states help determine whether a cache access request is a miss or a hit. A cache hit occurs when one of the ways of a cache index includes a tag matching that of the requested address and the cache coherency state for that way does not indicate invalid data. A cache miss occurs when none of the tags of an index set matches that of the requested address or when the way with a matching tag contains invalid data. FIG. 2 illustrates how MOSI cache coherency states transition in response to various types of misses. The events causing transitions between MOSI states are indicated using the acronyms IST, ILD, FST and FLD. As used herein, xe2x80x9cILDxe2x80x9d indicates an Internal Load; i.e., a load request from the processor associated with the cache. Similarly, IST indicates an Internal Store. xe2x80x9cFLDxe2x80x9d indicates that a Foreign Load caused the transition; i.e, a load request to the cache coming from a processor not associated with cache, and xe2x80x9cFSTxe2x80x9d indicates a Foreign Store.
xe2x80x9cSnoopingxe2x80x9d refers to the process by which a processor in a multi-processor system determines whether a foreign cache stores a desired item of data. As used herein, a snoop represents a potential, future request for an eviction , e.g., a FLD or a FST, on a particular address. Each snoop indicates the desired address and operation. Every snoop is broadcast to every Processor 32 within System 30, but only one Processor 32 responds to each snoop. The responding Processor 32 is the one associated with the Cache 34 storing the data associated with the desired address. Each Processor 32 within System 30 includes an External Interface Unit (EIU), which handles snoop responses.
FIG. 3 illustrates, in block diagram form, EIU 40 and its coupling to Bus 35 and Cache 34. EIU 40 receives snoops from Bus 35. EIU 40 forwards each snoop onto Cache Controller 42, which stores the snoop in Request Queue 46 until it can be filtered. Snoop filtering involves determining whether a snoop hits or misses in Cache 34 and indicating that to EIU 40. Given the architecture of FIG. 3, the latency between receipt of a snoop by EIU 40 and a response to it can be quite long under the best of circumstances. Snoop latency usually increases from its theoretical minimum in response to other pending cache access requests, such as a pending atomic operation, for example. An atomic operation refers to a computational task that should be completed without interruption. Processors 32 typically implement atomic operations as two sub-operations on a single address, one sub-operation on the address following the other without interruption. One atomic operation, for example, is an atomic load, which is a load followed immediately and without interruption by a store to the same address. To protect the data associated with an atomic operation during the pendency of the atomic operation, some processors cease filtering snoops, even though most snoops are for addresses other than that associated with the pending atomic operation. Two factors necessitate this approach. First, Cache includes a single data-and-tag read-write port, which, in response to a hit permits modification of both a cache line""s data and tag. Second, most processors respond to a snoop hit by immediately beginning data eviction. This is unacceptable during an atomic operation, therefore all access to Cache 37 is halted during the pendency of the atomic operation. However, the pendency of the atomic operation may so long that EIU 40 is forced to back throttle snoops. Other operations may also cause a processor to cease snoop filtering without regard to the addresses to be snooped. Thus, a need exists for an improved apparatus and method for filtering snoops independent of other pending cache access requests.
The apparatus and method of the present invention protects cache data from eviction during an atomic operation. The apparatus includes a first request queue, a second request queue, and an atomic address block. The first request queue stores an entry for each cache access request. Each entry includes a first set of address bits and an atomic bit. The first set of address bits represents a first cache address associated with the cache access request and the atomic bit indicates whether the cache access request is associated with the atomic operation. The second request queue stores an entry for each cache eviction request. Each entry of the second request queue includes a second set of address bits indicating a second cache address associated with the cache eviction request. The atomic address block prevents eviction of a third cache address during the atomic operation on the third cache address. During a first clock cycle the atomic address block receives and analyzes a first set of signals representing a first entry of the first request queue to determine whether they represent the atomic operation. If so, the atomic address block sets a third set of address bits to a value representative of the first cache address. During a second clock cycle in which the atomic operation is being executed the atomic address block receives and analyzes a second set of signals representing the second set of address bits to determine whether the second set of address bits represent a same cache address as the third set of address bits. If so, the atomic address block stalls servicing of the second request queue, thus preventing eviction of data from the cache upon which an atomic operation is being performed.