1. Field of the Invention
This invention is related to the field of processors and, more particularly, to instruction fetch mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A "wide issue" superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a "narrow issue" superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
In order to support wide issue rates, it is desirable for the superscalar processor to be capable of fetching a large number of instructions per clock cycle (on the average). For brevity, a processor capable of fetching a large number of instructions per clock cycle (on the average) will be referred to herein as having a "high fetch bandwidth". If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable to take advantage of the wide issue hardware due to a lack of instructions being available for issue.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a branch target address specified by the branch instruction. Accordingly, the processor may identify the branch target address upon fetching the branch instruction. Subsequently, the next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either branch target or sequential) as rapidly as possible.
Another factor which may impact the ability of a particular processor to achieve a high fetch bandwidth is the hit rate and latency of an instruction cache employed by the processor. Processors typically include an instruction cache to reduce the latency of instruction fetches (as compared to fetching from main memory external to the processor). By providing low latency access to instructions, instruction caches may help achieve a high fetch bandwidth. Furthermore, the low latency of access to the instructions may allow branch instructions to be rapidly detected and corresponding branch target addresses to be rapidly generated for subsequent instruction fetches.
Modern processors have been attempting to achieve shorter clock cycle times in order to augment the performance gains which may be achieved with high issue rates. Unfortunately, the short clock cycle times being employed by modern processors tend to limit the size of an instruction cache which may be employed. Generally, larger instruction caches have a higher latency than smaller instruction caches. At some size, the instruction cache access time (i.e. latency from presenting a fetch address to the instruction cache and receiving the corresponding instructions therefrom) may even exceed the desired clock cycle time. On the other hand, larger instruction caches typically achieve higher hit rates than smaller instruction caches.
Both high hit rates in the instruction cache and low latency access to the instruction cache are important to achieving high fetch bandwidth. If hit rates are low, than the average latency for instruction access may increase due to the more frequent main memory accesses required to fetch the desired instructions. Because larger instruction caches are capable of storing more instructions, they are more likely to be storing the desired instructions (once the instructions have been accessed for the first time) than smaller caches (which replace the instructions stored therein with other instructions within the code sequence more frequently). On the other hand, if the latency of each cache access is increased (due to the larger size of the instruction cache), the average latency for fetching instructions increases as well. As mentioned above, low average latency is important to achieving high fetch bandwidth by allowing more instructions to be fetched per clock cycle at a desired clock cycle time and by aiding in the more rapid detection and prediction of branch instructions. Accordingly, an instruction fetch structure which can achieve both high hit rates and low latency access is desired to achieve short clock cycle times as well as high fetch bandwidth.