Modern microprocessors have one or more internal cache memories to reduce average access time to microprocessor memory. Cache memories come in many different organizations and sizes, but generally have a data storage area and an address storage area. The data storage area is typically organized as a cache line of a number of bytes. In one embodiment, the cache line is 64 bytes. Caches may be specific to either instructions or data, or may be organized as a unified cache that stores both instructions and data. Cache memories are arranged hierarchically. In a microprocessor with Level1 (L1) and Level 2 (L2) caches, the L1 cache is the fastest cache to access, and is the first cache memory consulted when looking for an instruction or data in the memory subsystem of a microprocessor. L2 caches are typically larger and slower than L1 caches.
Data is stored in a cache line of a lower level cache memory (e.g., an L1 cache) from system memory or a higher level cache memory (e.g., L2 cache in a microprocessor having L1 and L2 caches), usually in response to a cache miss. Cache misses occur when a read (load) or write (store) operation attempts to access the cache, but the address it is reading from or writing to is not in the cache. For a load instruction, the microprocessor will usually load data from the L2 cache or system memory (wherever the data is present in the fastest accessible form) into an available cache line in L1 cache. For a store instruction, the microprocessor will usually store data directly to the L1 cache if an available cache line is present. If an available cache line is not present, the microprocessor may evict data from an L1 cache line to a higher level L2 cache line, according to the cache line replacement policy being used by the cache memory. In one embodiment, the cache replacement policy is LRU (least recently used).
If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is fully associative. If each entry in main memory can go in just one place in the cache, the cache is direct mapped. Many microprocessor caches implement a compromise, and are described as associative. In a 2-way set associative cache memory, any particular location in main memory can be cached in either of 2 cache lines in the cache. In a 4-way set associative cache memory, any particular location in main memory can be cached in either of 4 cache lines in the cache.
Cache lines are evicted from an L1 cache by microprocessor circuitry that selects an L1 cache line to evict, reads the cache line from L1 cache, writes the cache line to an available cache line in an L2 cache, and invalidates the cache line status in L1 cache. One protocol for cache line status is the MESI protocol, which is a widely used cache coherency and memory coherence protocol. MESI designates four possible states for each of the cache lines in the cache memory: Modified, Exclusive, Shared, or Invalid. A Modified cache line is present only in the current cache, and it has been modified from the value in main memory. The cache memory is required to write the data back to main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. An Exclusive cache line is present only in the current cache, but is up to date and matches main memory. A Shared cache line indicates that the cache line may be stored in other caches of the system. An Invalid cache state indicates that this cache line is invalid, and the contents do not represent a reliable data value. Evicted cache lines have an Invalid MESI status following eviction.
Cache line eviction from a lower level cache memory to a higher level cache memory usually takes multiple microprocessor clock cycles. Cache memories are often located relatively far apart in the microprocessor and the data payload of a single move is sometimes less than the size of a cache line. Often, there are other store or cache snoop operations that are directed to data in cache fully or partially within the same cache line as the line being evicted from cache. It is necessary for the store or snoop to know the state of the eviction process for the implicated cache line. If the store or snoop is allowed to continue without knowledge of the eviction operation, it is possible that data that has not yet been evicted will be overwritten by a store, or the cache line will be invalidated. Either will result in data corruption.
To solve this problem, microprocessors typically determine if a cache line is in the process of being evicted from a lower level cache to a higher level cache by comparing the address of store operations in the instruction pipeline directed to the cache, to the address of the evicted cache line. The address of the evicted cache line must be temporarily stored in the microprocessor until the store addresses have been compared. Addresses may either be compared serially or in parallel. Comparing many addresses in parallel requires many compare circuits and other logic to provide compared results to the microprocessor from all of the compare circuits. Comparing many addresses serially requires significant time for many store instructions, which slows cache eviction operations and cache performance. Therefore, what is needed is a way for a microprocessor to rapidly identify cache lines that are in the process of being evicted from a lower level cache to a higher level cache, without requiring the addition of significant amount of hardware for address comparison.