A processor is commonly considered to be the "brains" of a computer system. Increasing the speed of the processor will tend to increase the computational power of the computer, and many methods are employed to increase processor speed to create more powerful computers for consumers. A processor retrieves and manipulates data under the control of software to obtain a desired result, so increasing the processor's data retrieval speed is one way to achieve overall increased processing speed. One method for increasing a processor's data retrieval speed is called "prefetching."
A computer system includes different banks of memory locations arranged in a memory hierarchy. Memory locations that are more quickly accessible by the processor are typically considered to be "closer" to the processor in the memory hierarchy. Memory locations that take longer for the processor to access are considered to be "further" from the processor. Prefetching is a method in which data that is stored in one memory location of the memory hierarchy is transferred to a memory location that is closer to the processor. This transfer occurs before the data is actually needed by the processor. In this manner, the data can be more quickly retrieved by the processor when it is needed, thereby increasing the processor's overall processing speed.
For example, in many computer systems a first level of memory closest to the processor is called an L0 cache. An L0 cache is typically located on the same semiconductor substrate as the processor, making data retrieval from the L0 cache by the processor very simple and quick. The next level of memory further from the processor is an L1 cache. An L1 cache may be located in the same package as the processor, but on a separate semiconductor substrate, making data retrieval from the L1 cache somewhat more difficult and time consuming. L2, L3 and higher cache levels, when used, are memory banks located progressively further from the processor and exhibit respective increased delays in data retrieval. Main memory is typically further from the processor in the memory hierarchy than any cache, and the disk drive is located even further from the processor than the main memory.
A prefetch instruction instructs the processor to prefetch data from one memory location, such as the L1 cache, and to store it in a memory location that is closer to the processor, such as the L0 cache. A load instruction instructs the processor to load data from a memory location, such as the L0 cache, into a processor register for further manipulation of the data by the processor. Ideally, a load instruction is preceded by an associated prefetch instruction. The prefetch brings the data closer to the processor so that the load is more quickly executed. During or after a computer programmer writes the program code of a software application, the programmer inserts prefetch instructions before (i.e. earlier in the program sequence) load instructions to improve the speed of the loads. Alternatively, intelligent compilers modify the program code in the same manner.
Instructions in a program code for a processor that does branch prediction are either speculative or non-speculative. Speculative instructions are instructions that reside after (i.e. later in the program sequence) a branch instruction and are executed, or not executed, depending on whether the branch is taken or not taken (assuming the program is not otherwise interrupted). A branch is taken if a variable is calculated to be a predetermined value. Non-speculative instructions are executed regardless of whether a previous branch is taken or not taken.
If, during a prefetch, a memory exception occurs such as a translation lookaside buffer miss, the exception is ignored and the data is not stored in the new memory location closer to the processor. This is fine if a subsequent, speculative load instruction is never executed by the processor. If the load instruction is subsequently executed, however, the time-consuming exception is then handled upon reaching the load instruction, and the data is loaded from the more remote memory location. This wastes valuable time retrieving the data instead of manipulating it.
To remedy this situation, a programmer can replace the prefetch instruction with a load instruction, effectively moving the load instruction earlier in the programming sequence, if the load instruction is non-speculative. The problem with this approach is that a processor register is wasted storing data before the data is needed. Wasting a register in this manner can reduce processing efficiency.