The present invention relates to microprocessors and more particularly to methods and apparatus for optimizing prefetch performance.
Modem microprocessors typically implement instruction prefetching. Prefetching is a mechanism whereby the processor hardware attempts to load or prefetch instructions into an instruction cache from higher levels of caches or from memory. If the load into the instruction cache occurs prior to the time the processor fetches the instruction, cache misses and associated performance penalties will not occur. Each one of these prefetch operations will attempt to load a number of instructions into the instruction cache. The number of instructions so loaded is typically equal to the number of instructions in a cache line. A cache line is defined to be the fundamental quantity of data that may be read or written into a cache.
Instruction prefetches may be initiated programmatically via prefetch instructions, by the hardware, or by a combination of the two. The prefetches may attempt to load just a few instructions, or they may attempt to load a long sequence of instructions. A problem can occur when prefetching a long sequence of instructions. In particular, instructions may be prefetched that will never be executed due to a change in control flow or branch. This situation can degrade performance for two reasons. First, every prefetch requires the use of processor and system resources, e.g., higher levels of caches, system busses, and memory units. If these resources are used by a prefetch they are unavailable for other uses, e.g., load or store operations. Second, when instructions are prefetched into the instruction cache, room must be made for them by overwriting existing instructions. These existing instructions may form part of the working set, i.e., they might be needed by the processor in the near future. Thus, overagressive prefetching, which occurs when too many instructions have been prefetched into the instruction cache but not yet fetched by the processor, can cause resources to be wasted and useful instructions in the instruction cache to be replaced by ones that may never be used.
Thus, there exists a need for limiting the number of instructions prefetched ahead of where the processor is fetching instructions from the current instruction pointer. It would be desirable and of considerable advantage to provide a mechanism by which the processor may prefetch a certain distance ahead of the instruction pointer. Such prefetching helps to hide the latency of the fetching process and prevents cache misses on instruction fetches without getting too far ahead as that could lead to wasted resources such as memory bandwidth and the replacement of useful instructions in the instruction cache.
In representative embodiments, the present invention provides method and apparatus for controlling the rate of instruction address prefetches by a microprocessor. Previous methods for prefetching have not concentrated on limiting the number of instructions prefetched ahead of where the processor is fetching instructions from the current instruction pointer leading to possible wasted memory bandwidth and the possible replacement of useful instructions in the instruction cache.
In a representative embodiment, the bits in a shift register are used to count the number of instruction addresses that have been prefetched. When an instruction prefetch address is issued to the processor, the prefetched address is added to a register and a logical one is shifted into the shift register from the left. Each prefetch issue to the processor will cause a cache line of instructions to be written into the instruction cache. When the last prefetched instruction on a cache line is fetched, a logical zero is shifted into the shift register from the right. When a logical one has been shifted into a preselected bit in the shift register, prefetching is temporarily suspended until the last instruction on a cache line is fetched by the processor, and a logical zero is shifted back into the preselected bit in the shift register. In summary, logical ones are shifted into the register from the left on prefetches and logical zeros are shifted into the register from the right when the instruction pointer crosses onto a new cache line. This mechanism will assure that, at most, xe2x80x9cnxe2x80x9d cache lines have been prefetched but not yet fetched by the processor. In other words, prefetches may be kept xe2x80x9cnxe2x80x9d cache lines in front of the instruction pointer by examining the n-th bit from the left.
A primary advantage of the embodiments as described in the present patent document over prior microprocessor prefetching techniques is that overagressive prefetching is eliminated. Prefetching into the instruction cache too many instructions beyond that which have been fetched by the processor can cause resources, such as memory bandwidth, to be wasted and useful instructions in the instruction cache to be replaced by ones that may never be used. The number of prefetches in front of the current instruction pointer is tightly controlled. Embodiments of the present invention thereby conserve valuable system resources.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.