High performance processing systems require fast memory access and low memory latency, for quickly processing data. Since system memory is slow to provide data to the processor, caches are designed to provide a way to keep data close to the processor with quicker access time for its data. Larger caches give better system performance overall but inadvertently can induce more latency and design complexities compared to smaller caches. Usually smaller caches are designed to provide a fast way for a processor to synchronize or communicate to other processors in system applications level, especially in networking or graphics environment.
Processors retrieve data to and from memory, via Loads and Stores. Data from system memory fill up the cache in time. The optimum condition is where most or all of processor accessing data is in cache. This could happen if an application data size is same or smaller than the cache size. In general, cache size is usually limited by design or technology and cannot contain the whole application data. This is becoming a problem when the processor accessing the new data that is not in the cache and no cache space is available to put the new data. Hence, the cache controller needs to find an appropriate space in the cache for the new data when it arrives from memory. LRU (Least Recently Used) algorithm is used in cache controller to handle this situation. LRU determines which location is to be used for the new data based on the data access history. If LRU selects a line that is consistent with the system memory, e.g. shared state, then the new data will be over written to that location. When LRU selects a line that is marked ‘Modified’, which means that data is not consistent with the system memory and unique, cache controller forces the ‘Modified’ data of this location to be written back to the system memory. This action is called ‘write back’ or ‘castout’, and the cache location that contains the write back data is called ‘Victim Cache Line’.
In a typical cache design, the LRU algorithm is used to best estimate the future data reuse by the software via removing the least recently used data. However, LRU may make an incorrect selection and that can cause a future cache miss on the same data. This then requires another long latency reload from main memory for the missed data.
In addition to this long latency write back and reload, another situation can cause performance degradation. A cache controller attempts to complete the write back operation expediently, by sending the data to the system memory via designated bus operations. During the write back operation, bus snoop operation comes in with its address matches to the write back address; the snoop operation will be retried. In another words, until the write back data is in the system memory, all subsequent snoops' hits on the same write back data will be retried. Snoop operation is necessary on the system bus to maintain memory coherency between multiprocessor cache and system memory.
Since the write back operation is a long latency bus operation, all snoop operations hitting on write back address will be retried. This creates problems on system performance and sometimes may create a live-lock situation. Hence, by avoiding this long latency write back operation as much as possible, better the system performance will be.
An exemplary write back cache is implemented to provide a fast way for processors to access data, communicate, and synchronize between tasks with optimum performance. Even though the amount of data in and out of this cache is small, a mechanism to cancel write back operation whenever possible is needed for better performance. There are two types of operations that create an empty space in cache, either a ‘snoop push’ or a ‘snoop kill’. One example of snoop push operation results from a store from another bus agent without a cache, for e.g. IO controller on the system bus, on a modified cache hit data. Cache controller will retry this IO controller store request on the bus and the latest copy of modified data will be pushed out to memory so that IO controller can update on the latest modified data to memory. Snoop push operation pushes out modified data to system memory and keeps the data as shared or invalid. Snoop kill operation, for example, as in cache flush, invalidates an entry, which creates a room in cache for subsequent cache miss reload. Therefore, since an empty space is created by either a snoop push or a snoop kill operation, the write back is not necessary for a concurrent cache miss reload.
System performance is improved with this mechanism since the cancelled Write back in turn eliminate subsequent possible cache misses and the snoop retries that could have hit on the victim during write back. In addition, canceling long latency bus operations like Write back puts less strain on the bus especially when a snoop push operation is occurring at the same time. Therefore, it is desirable, to be able to cancel a pending write back operation if the snoop state machine is busy doing a snoop push or snoop kill.