1. Field of the Invention
The present invention is related to the field of processors and, more particularly, to branch prediction and fetch mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A "wide issue" superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a "narrow issue" superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
In order to support wide issue rates, it is desirable for the superscalar processor to be capable of fetching a large number of instructions per clock cycle (on the average). For brevity, a processor capable of fetching a large number of instructions per clock cycle (on the average) will be referred to herein as having a "high fetch bandwidth". If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable to take advantage of the wide issue hardware due to a lack of instructions being available for issue.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a branch target address specified by the branch instruction. Accordingly, the processor may identify the branch target address after fetching the branch instruction. Subsequently, the next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either branch target or sequential) as rapidly as possible.
As used herein, a branch instruction is an instruction which specifies the address of the next instructions to be fetched. The address may be the sequential address identifying the instruction immediately subsequent to the branch instruction within memory, or a branch target address identifying a different instruction stored elsewhere in memory. Unconditional branch instructions always select the branch target address, while conditional branch instructions select either the sequential address or the branch target address based upon a condition specified by the branch instruction. For example, the processor may include a set of condition codes which indicate the results of executing previous instructions, and the branch instruction may test one or more of the condition codes to determine if the branch selects the sequential address or the target address. A branch instruction is referred to as taken if the branch target address is selected via execution of the branch instruction, and not taken if the sequential address is selected. Similarly, if a conditional branch instruction is predicted via a branch prediction mechanism, the branch instruction is referred to as predicted taken if the branch target address is predicted to be selected upon execution of the branch instruction and is referred to as predicted not taken if the sequential address is predicted to be selected upon execution of the branch instruction.
Unfortunately, even if highly accurate branch prediction mechanisms are employed, fetch bandwidth may still suffer. Typically, a plurality of instructions are fetched by the processor, and a first branch instruction within the plurality of instructions is detected. Instructions subsequent to the first branch instruction are discarded if the branch instruction is predicted taken, and the branch target address is fetched. Accordingly, the number of instructions fetched during the clock cycle in which a branch instruction is fetched and predicted taken is limited to the number of instructions prior to and including the first branch instruction within the plurality of instructions being fetched. Since branch instructions are frequent in many code sequences, this limitation may be significant. Performance of the processor may be decreased if the limitation to the fetch bandwidth leads to a lack of instructions being available for dispatch. A method for increasing the achievable fetch bandwidth in the presence of predicted taken branch instructions is therefore desired.