This invention pertains to prefetching data into a cache on a computer, and more particularly to prefetching data that is identified as critical.
When computers first became available, they ran programs by executing instructions using in-order execution. Before instruction number two could be executed, instruction number one had to complete. Since clock speeds were relatively slow, this was not a significant issue. The processor could not execute complicated instructions much faster than any other part of the computer could support the instruction. But modern processors are much more efficient than their ancestors were. Modern computers are capable of running at very high clock rates and may perform complicated instructions in very few clock cycles.
But while processor clock speeds have increased dramatically, improvements in other parts of the computer have been less significant. Specifically, at the high clock rates in modem processors, it may take thousands of clock cycles to access data from memory. In an in-order instruction processor, the processor must wait for a memory access to complete before it may continue with the next instruction. This may cause significant delay in program execution. To deal with this problem, processors began to run programs using out-of-order execution. While one complicated instruction is delayed (for example, due to a memory access), other instructions that do not depend on the delayed instruction may be executed. For out-of-order execution to work, the processor needs to be able to do several things. First, the processor determines whether a later instruction is dependent on the delayed instruction. For example, consider the situation where a value is loaded from memory into a register in the processor. If a later instruction adds the value in that register to another value in another register, this later instruction is dependent on the delayed instruction: it may not execute until after the load instruction completes. On the other hand, an add instruction that adds two registers that are totally unrelated to the load instruction may be executed while the value is loaded from memory, even though the exact instruction order suggests that this add instruction should not execute yet.
Second, the processor buffers any dependent instructions for later execution. If the processor detects that a later instruction is dependent on a delayed load instruction, the later instruction may not be executed out-of-order, and is buffered until after the load instruction completes.
Third, the processor renames registers. A register may be renamed when a later instruction that is not dependent on the delayed load instruction uses a register that is used by the delayed load instruction. In this case, the processor needs to be able to rename the register used by the later instruction so that the xe2x80x9cparallel executionxe2x80x9d of the load instruction and the later instruction does not create a conflict.
FIG. 1 shows how a processor in the prior art operates. Processor 105 receives instruction sequence 110. While a load instruction is pending, processor 105 examines later instructions to see if they are dependent on the delayed load instruction. If the later instruction is dependent on the delayed load instruction, the later instruction is buffered in buffer 115. Otherwise, the later instruction may be executed out-of-order, and joins executed instructions 120.
Two concerns may arise that limit the effectiveness of out-of-order execution. First, processor 105 may fill buffer 115 with dependent instructions. Once the buffer is full, processor 105 may not add any more instructions to buffer 115, and all later instructions have to wait until the delayed load instruction completes. Second, the program may include a branch instruction after the load instruction. Even with branch prediction, processor 105 may not execute the instructions without some way to reverse the process in case the branch prediction was incorrect. Typically, processor 105 will simply buffer the instructions rather than execute and risk having to rewind the program execution.
The problems with out-of-order execution are exacerbated by the possibility of multiple load instructions within a relatively small block of instructions. With multiple independent load instructions, if the load instructions are executed in their original order, the processor may be more inefficient than it needs to be.
Other problems related to load instruction delays have to do with caching. Cache lines containing data requested by load instructions may be removed from the cache even though other nearby data will be requested shortly. And cache lines containing data that may be loaded shortly may not be fetched into the cache in advance of their need.
The present invention addresses these and other problems associated with the prior art.