1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to mechanisms for fetching instruction bytes into an instruction cache from a main memory subsystem.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Recently, the trend in microprocessor frequencies has been toward increasingly higher rates (i.e. shorter clock cycles). Frequencies in the range of 200-300 MHz are becoming common. At the same time, main memory access times have remained relatively constant. Therefore, the number of clock cycles which expire while the processor awaits data or instruction bytes from the main memory has been increasing. The increased number of clock cycles spent waiting for a memory response tends to decrease the potential performance of the microprocessor (e.g. if memory access times were faster, the microprocessor may be able to achieve a greater performance level).
While important for both instruction and data accesses, superscalar microprocessors are particularly sensitive to delays in receiving instructions. Speculative execution of instructions, often employed by superscalar microprocessors, may allow execution of instructions subsequent to an instruction which is stalled awaiting data from memory. Therefore, the number of instructions executed per clock cycle may remain high, even in the face of long main memory data accesses. However, if the instructions to be executed are not stored in an internal instruction cache of the microprocessor, the microprocessor must fetch the instructions from the main memory. The microprocessor must then await the return of the instructions in response to the fetch before proceeding forward with the execution of the instructions. Any instructions subsequent to the instructions being fetched from main memory, even if the subsequent instructions are stored in the instruction cache, cannot be fetched and/or dispatched until the main memory responds. Superscalar microprocessors are greatly affected by the wait for instructions since they attempt to execute multiple instructions concurrently and hence require a high instruction fetch bandwidth. Average instruction fetch bandwidth is decreased due to the lack of instructions provided while awaiting instructions fetched from memory.
Because the wait incurred when fetching instructions from a main memory is long, it is important to store a cache line of instructions within the instruction cache when the cache line is fetched from main memory. However, often times a branch instruction may exist within the first few instructions of the cache line of instructions being fetched. If the branch instruction is to be predicted taken, the remaining instructions within the cache line are not immediately needed. More importantly, instructions from another cache line containing the target of the branch instruction are immediately needed. Such other cache line may be stored in the instruction cache. Nonetheless, the remaining instructions of the cache line of instructions being transferred from the main memory may be subsequently fetched by the microprocessor (either because the branch instruction is mispredicted or according to the continued execution of the program). A mechanism for fetching cache lines from main memory which balances the need to quickly locate and predict branch instructions and the need to store cache lines of instructions into the cache is desired.
As used herein, the term "cache line" refers to the smallest unit of memory manipulated by a cache. The bytes within the cache line are allocated space and deallocated space within the cache as a unit. Cache lines are typically aligned in memory, such that each byte within the line may be located by an offset which forms the least significant bits of the address. For cache lines having a number of bytes which is an even power of two, the number of least significant bits of the address forming the offset is the power.