Execution of applications in a computational environment typically involves fetching of application instructions into an instruction cache associated with a processor of the computational environment. Many applications are too large and/or instruction cache is too small to fetch all the application instructions into cache at one time. Accordingly, techniques are used to determine which instructions to fetch into cache, which to remove from cache, etc. For example, some processors include hardware prefetch functionality that looks down an instruction stream as the application executes and attempts to identify and prefetch future instructions in the stream.
In many large commercial applications, it is difficult to accurately predict which instructions will be executed in which order at runtime. For example, typical applications include many branch points in the application code, so that execution of the application can proceed in many different ways. These types of difficulties tend to limit the effectiveness of traditional hardware prefetching techniques, which can result in a high rate of instructions not being available in cache when needed (“instruction cache misses”). Execution of instructions that are not available in cache can involve retrieving the instructions from the main memory or the like, which can appreciably slow down execution of the application (“an instruction cache miss penalty”). Instruction cache misses can also reduce the effectiveness of certain types of optimizations, such as out-of-order application execution.