In a typical processor, instructions are executed by an instruction pipeline that includes a branch prediction unit and a fetch unit collectively referred to as the front end of the pipeline, a decode unit, an execute/scheduler unit, and a load/store unit that includes a level one (L1) data cache that interfaces with a level two (L2) data cache. The fetch unit includes an L1 instruction cache that interfaces with an L2 instruction cache. One or more additional levels of data caches and/or instruction caches can also be implemented. A typical processor also includes a main memory having a physical address space that is organized into memory blocks. A typical cache includes a number of storage segments in which memory blocks from the main memory (or data blocks from another storage component) can be cached.
In a (single or multiple) processor system that includes a multi-level cache hierarchy, multiple valid instances of an instruction or other data item can exist simultaneously in different storage locations. Each instance of a particular instruction or data item corresponds to the same physical address and therefore to the same memory block. The modification of one instance of an instruction or data item often renders invalid at least one other instance of that same instruction or data item in another location, which is typically a cache.
Two types of instructions to which modification is permitted during execution are self modifying code (SMC), which is code that is modified within the same processor that is handling the modification, and cross modifying code (CMC), which is code that is modified at a processor other than the one that is handling the modification. When an instance of an instruction is modified, the component at which the modification occurs and/or is discovered typically sends an invalidation probe to each cache at which a now invalid instance of that instruction is (or at least potentially is) stored. An invalidation probe is a type of cache coherency probe, which is a message that is sent from a component in a processor to a cache in either the same or another processor to determine whether that cache is currently storing certain data and/or to indicate the state into which that cache should place that data (if present).
Upon receipt of an invalidation probe, a cache typically takes and/or triggers one or more actions to prevent, interrupt, and/or undo the execution of a now invalid instruction. Examples of these actions include invalidating any cached instance of the memory block that contains the instruction, canceling any pending operation cache (op cache) builds for the physical address of the instruction, refetching the corresponding memory block (from, e.g., a higher level cache), and triggering a resync of the pipeline.
There are different ways in which current implementations handle invalidation probes, each of which is wasteful in its own way of resources such as time, power, hardware (e.g., ports), and substrate surface area. Some are too overinclusive in selecting the cached memory blocks with respect to which they take responsive actions, resulting in the invalidation of more cached memory blocks than theoretically necessary and triggering too many resyncs. Others test every invalidation probe to make a conclusive determination as to whether the invalidation probe is directed to a physical address that is contained in a cached memory block before taking any responsive actions. This avoids the overinclusive problem but comes at a cost of dedicating hardware and/or reducing throughput due to granting processing time and often priority to dispositively evaluating every invalidation probe. There is a need for more efficient and effective handling of cache coherency probes, including invalidation probes.