1. Field of the Invention
The present invention relates to the art of microprocessors and, more particularly, to circuits and methods within the microprocessor for generating target addresses.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time. To the extent a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors that employ wider issue rates. A "wide issue " superscalar processor is capable of dispatching a larger number of instructions per clock cycle when compared to a "narrow issue " superscalar processor. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
To support wide issue rates, the superscalar processor should be capable of fetching a large number of instructions per clock cycle (on the average). A processor capable of fetching a large number of instructions per clock cycle will be referred to herein as having a "high fetch bandwidth. " If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable utilize wide issue hardware if contained therein.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a target address specified by the branch instruction. Accordingly, the processor may identify the target address after fetching the branch instruction. The next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of conditional branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either target or sequential) as rapidly as possible.
As used herein, a branch instruction is an instruction that specifies, either directly or indirectly, the address of the next instruction to be fetched. The address may be the sequential address identifying the instruction immediately subsequent to the branch instruction within memory, or a target address identifying a different instruction stored elsewhere in memory. Unconditional branch instructions always select the target address, while conditional branch instructions select either the sequential address or the target address based upon a condition specified by the branch instruction. For example, the processor may include a set of condition codes which indicate the results of executing previous instructions, and the branch instruction may test one or more of the condition codes to determine if the branch selects the sequential address or the target address. A branch instruction is referred to as taken if the target address is selected via execution of the branch instruction, and not taken if the sequential address is selected. Similarly, if a conditional branch instruction is predicted via a branch prediction mechanism, the branch instruction is referred to as predicted taken if target address is predicted to be selected upon execution of the branch instruction, and is referred to as predicted not taken if the sequential address is predicted to be selected upon execution of the branch instruction.
Unfortunately, even if highly accurate branch prediction mechanisms are used to predict branch instructions, fetch bandwidth may still suffer. Typically, a run of instructions is fetched by the processor, and a first branch instruction within the run of instructions is detected. Fetched instructions subsequent to the first branch instruction are discarded if the branch instruction is predicted taken, and the target address is fetched. Accordingly, the number of instructions fetched during clock cycle in which a branch instruction is fetched and predicted taken is limited to the number of instructions prior to and including the first branch instruction within the run of instructions being fetched. Since branch instructions are frequent in many code sequences, this limitation may be significant. Performance of the processor may be decreased if the limitation to the fetch bandwidth leads to a lack of instructions being available for dispatch.