1. Field of the Invention
This invention relates to microprocessors, and more particularly, to the management of resource allocation for speculative fetched instructions following small backward branch instructions.
2. Description of the Relevant Art
Modern processor cores, or processors, are pipelined in order to increase throughput of instructions per clock cycle. However, the throughput may still be reduced due to certain events. One event is a stall, which may be caused by a branch misprediction, a cache miss, data dependency, or other, wherein no useful work may be performed for a particular instruction during a clock cycle. Another event may be that resources, such as circuitry for an arithmetic logic unit (ALU) or for a load-store unit (LSU), may not be used for one or more clock cycles due to the type of instruction being executed in a particular clock cycle.
Different techniques are used to fill these unproductive cycles in a pipeline with useful work. Some examples include loop unrolling of instructions by a compiler, branch prediction mechanisms within a core and out-of-order execution within a core. An operating system may divide a software application into processes and further divide processes into threads. A thread is a sequence of instructions that may share memory and other resources with other threads and may execute in parallel with other threads. A processor core may be constructed to execute more than one thread per clock cycle in order to increase efficient use of the hardware resources and reduce the effect of stalls on overall throughput. A microprocessor may include multiple processor cores to further increase parallel execution of multiple instructions per clock cycle.
Further, due to spatial and temporal locality of memory line accesses of a cache, each processor core may prefetch one or more memory lines of an instruction cache (i-cache) during a fetch of a requested memory line. If the prefetch line hits in the first-level, or L1, I-cache, then during a subsequent access, the memory line may already be located in a fetch buffer within the processor or may shortly arrive to the processor due to the earlier speculative prefetch. Therefore, the latency to access the instructions from the memory hierarchy may be greatly reduced. Also, if the prefetch line misses in the L1 I-cache, the access of the L2 I-cache, and possibly lower levels of memory if needed, may begin earlier. Again, the latency to access instructions may be reduced.
A problem arises with the speculative prefetch of an i-cache memory line when the previous accessed memory line includes a small backward taken branch instruction. For example, a memory line may include multiple instructions. One of these instructions may be a backward taken branch instruction. Another instruction in the same memory line may be the target instruction for the branch instruction. Therefore, the address of the branch target memory line is the same address as this memory line that was already fetched. Until the branch condition is not satisfied and instruction flow breaks out of the loop, the speculative fetched memory line is not needed. However, during each iteration of the loop, this speculative memory line is fetched and consumes hardware resources within the processor. No useful work will be performed for the instructions within this memory line during the loop iterations. However, resources, such as registers, buffer entries, ports, buses, and other, are unnecessarily used by these instructions and may greatly add to the latency of other instructions not able to utilize the resources.
In view of the above, an efficient method for the management of resource allocation for speculative fetched instructions following small backward branch instructions is desired.