Many different techniques have been developed to increase the speed at which processors execute instructions in a computing system. For example, a computing system may include multiple processors capable of executing instructions in parallel. As another example, a computing system may include one or multiple processors that are capable of executing instructions in multiple independent “threads.”
A problem with conventional computing systems is that retrievals of data from external memory are often slower than the processing speed of the processors in the computing systems. If a conventional computing system waits until an instruction is executed to retrieve data for that instruction, the processor executing the instruction would typically wait or “stall” until the needed data is retrieved from the external memory. This delay or latency slows the execution of the instructions in the computing system, which decreases the performance of the system.
Conventional computing systems often prefetch data in an attempt to reduce this latency. Prefetching data typically involves a computing system attempting to identify the data that an instruction will require and then retrieving that data before the instruction is executed. However, prefetching data is routinely implemented for performance improvement only. Prefetching data often does not alter the functionality of a program being executed or the status of a processor executing the program.