The invention relates generally to the operation of cache memory in a processor, and more particularly, to executing a cache replacement algorithm that mitigates the negative effects of speculative fetching in a cache memory.
Instruction-fetching within processors may be autonomous with respect to the actual processing of instructions. This is particularly true when instruction-fetching is driven by a branch prediction mechanism that records historical branch addresses within the code, and the historical target addresses for those branches. Such mechanisms have been referred to as branch history tables (BHTs), and more recently branch target buffers (BTBs).
When presented with an instruction address, a BHT provides the next instruction address that should be fetched. If a branch was found, the BHT also provides a specific indicator as to where the branch instruction was found within the current instruction-fetch group and the specific target address for the branch.
Addresses used for instruction-fetching are aligned (i.e., have a granularity) based on the instruction-fetch width (e.g., double-word, quad-word or double-quad-word). In contrast, branch instructions and their target addresses are aligned based on the instruction width (e.g., word, halfword, or byte). Therefore, instruction-fetching is performed at a coarser granularity (i.e., higher bandwidth) than the actual processing of instructions. Further, instruction-fetch groups are fetched from a cache, which maintains data at an even coarser granularity (e.g., cache lines are typically 128, 256, or more bytes). Thus, each cache line contains multiple instruction-fetch groups, and each instruction-fetch group contains multiple instructions.
When a BHT outputs an instruction-fetch address, an attempt is made to fetch the associated instruction-fetch group (the group including the instruction-fetch address) from the level-one (L1) instruction cache. If the cache line containing the instruction-fetch group is resident in the L1 instruction cache, then the successful fetch attempt results in a “cache hit” and a copy of the instruction-fetch group is placed in an instruction buffer for eventual processing by the processor pipeline. If the cache line containing the instruction-fetch group is not resident in the L1 instruction cache, then the unsuccessful fetch attempt results in a “cache miss”, and the address of the instruction-fetch group is sent to the next higher level cache in the cache hierarchy (e.g. a L2 cache) for processing. Eventually, a copy of the cache line containing the instruction-fetch group will be moved into the L1 instruction cache, and the instruction-fetch group can then be obtained from the newly resident cache line.
When a cache miss occurs, a new cache line will be brought into the L1 instruction cache. The new cache line will displace another line in the L1 instruction cache. Sometimes, the contents of the displaced line are still needed (or required) by the processor. When this is so, it is inevitable that another cache miss will be generated to re-fetch the displaced line. This new cache miss could have been avoided had the corresponding line not been displaced by the original miss.
When the original cache miss is useful (meaning that the line that is brought in contains instructions that actually must be executed), then the subsequent cache miss is unavoidable. However, in the case of instruction-fetching, many fetches are speculative (meaning that it is not certain that the instruction-fetch group being fetched contains instructions that will be executed) particularly when the instruction addresses are generated by a branch prediction mechanism. It would be useful if there were a way to eliminate the replacement of resident, and possibly useful, cache lilies by speculatively fetched cache lines that do not contain any instructions that actually have to be executed by the program.