In modern processors, execution pipelines are often used to process instructions. To achieve correct functionality, a processor adheres to processor inclusion such that any instruction line that has been delivered into an execution pipeline of a processor may later need to be re-delivered in an unmodified state. Therefore, deallocation or eviction of the line, in particular from an instruction cache, may not take place until all instructions from that instruction line are no longer being processed in the execution pipeline.
A way to adhere to processor inclusion is to serialize the execution pipeline to clear the pipeline of instructions before deallocation or eviction of the cache line. But this may limit a processing capacity of the processor by creating downtime in the execution pipeline.
A processor may employ an inclusion buffer such as a victim cache in order to avoid the frequent pipeline serializations. A victim cache holds evicted lines until it can be determined that no instructions from an instruction line are being processed in the execution pipeline. One way to make such a determination is to insert a special micro-operation into the execution pipeline when an entry is allocated into the victim cache. Design constraints may limit a victim cache to store only a few entries (e.g., four or eight). If too many instruction lines are evicted from the instruction cache prior to a victim cache deallocation, the victim cache can fill up resulting in unwanted stalls for the execution pipeline.
Another technique includes a single level in-use scheme described in U.S. Patent Application 2008/0065865. An in-use field for each entry of a storage unit (such as an instruction cache and/or an ITLB) may be utilized to determine if it is allowed to modify that entry. For example, if the in-use bit indicates that the corresponding entry is unused, that entry may be removed or replaced without further latency. However, this single-level scheme (i.e. one in-use bit per entry) is imprecise because it does not track the retirement of instructions but simply relies on the pipeline serializations that mark all entries not in-use. The in-use array starts with all zeros. As time passes, more and more entries are marked as in-use, making it harder to find non-in-use entries to freely replace. Eventually, most bits become marked as in-use even although some of these bits may not actually be in-use, which results in pipeline serializations and performance penalty
U.S. Pat. No. 7,925,834 describes another technique to adopt a filter mechanism (either the in-use bits, or the LRU hint based scheme) that further reduces the number of evictions into victim cache. When the in-use tracking mechanism's filtering efficiency is found to be low based on certain criteria (i.e. too many entries are falsely marked as in-use), all in-use bits in the filter mechanism are cleared but instead a global marker may mark all instruction lines as in-use while the inserted micro-operation is in the execution pipeline. This approach may also limit the processing capacity of the processor by falsely marking all lines as in-use while the micro-operation is in the pipeline.