1. Field of the Invention
The present invention is related to the field of processors and, more particularly, to multiple branch history table access during a single clock.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. As used herein, the term clock cycle means a period of time allocated to a superscalar processing stage for accomplishing the function assigned to that stage. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A xe2x80x9cwide issuexe2x80x9d superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a xe2x80x9cnarrow issuexe2x80x9d superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
In order to support wide issue rates, it is desirable for the superscalar processor to be capable of fetching a large number of instructions per clock cycle (on the average). For brevity, a processor capable of fetching a large number of instructions per clock cycle (on the average) will be referred to herein as having a xe2x80x9chigh fetch bandwidthxe2x80x9d. If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable to take advantage of the wide issue hardware due to a lack of instructions being available for issue.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a branch target address specified by the branch instruction. Accordingly, the processor may identify the branch target address after fetching the branch instruction. Subsequently, the next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either branch target or sequential) as rapidly as possible. They are several different branch prediction mechanisms currently in use within microprocessors. One branch prediction mechanism employs a branch history storage device for storing a multi-bit branch history value, each bit of which identifies the resolution of a previously predicted branch instruction. This multi-bit branch history value is used, alone or in combination with the instruction address of the branch instruction to be predicted, to index bimodal counters in a branch history table. The bimodal counters have four states, and branch instructions are predicted xe2x80x9ctakenxe2x80x9d or xe2x80x9cnot takenxe2x80x9d depending on the value of the bimodal counter read from the history table.
As used herein, a branch instruction is an instruction which specifies the address of the next instructions to be fetched. The address may be the sequential address identifying the instruction immediately subsequent to the branch instruction within memory, or a branch target address identifying a different instruction stored elsewhere in memory. Unconditional branch instructions always select the branch target address, while conditional branch instructions select either the sequential address or the branch target address based upon a condition specified by the branch instruction. For example, the processor may include a set of condition codes which indicate the results of executing previous instructions, and the branch instruction may test one or more of the condition codes to determine if the branch selects the sequential address or the target address. A branch instruction is referred to as taken if the branch target address is selected via execution of the branch instruction, and not taken if the sequential address is selected. Similarly, if a conditional branch instruction is predicted via a branch prediction mechanism, the branch instruction is referred to as predicted taken if the branch target address is predicted to be selected upon execution of the branch instruction and is referred to as predicted not taken if the sequential address is predicted to be selected upon execution of the branch instruction.
Typically, a plurality of instructions are fetched by the superscalar processor, the plurality containing at least two conditional branch instructions. In order to take advantage of wide issue superscalar architecture, it is sometimes necessary to predict both fetched branch instructions in the same clock cycle. However, prior art branch prediction mechanisms are configured for only one branch prediction per clock cycle. In these prior art processors, two clock cycles may be needed particularly when the first instruction is predicted as not taken or taken to a target address just prior to the second branch instruction. The need for two clock cycles to predict the pair of branch instructions may have adverse impact on processor performance. It would be desirable to sustain two branch predictions per clock cycle, especially since many of the pairs of conditional branch instructions fetched per clock cycle are predicted not taken or taken with a target address just prior to the second conditional branch instruction.
The problems outlined above in large part are solved by the present invention which allows at least two branch instructions to be predicted in a single clock cycle. The present invention sustains the at least two branch instruction predictions by providing a circuit and method for multiple branch history table access in a single clock cycle. In accordance with the present invention, a circuit and method is provided for generating a first branch history table index which is used to access a branch history table. A first counter value is read from the branch history table in response to accessing the branch history table using the first branch history table index. Additionally, a second branch history table index is generated which is used for accessing the branch history table. In response to accessing the branch history table using the second branch history table index, a pair of counter values are read therefrom. One of the pair of counter values is selected as the second counter, the selection being based upon the value of the first counter value. The first counter value is used to predict a first branch instruction while the counter value is used to protect a second branch instruction.
The first and second branch history table indexes are generated within one clock cycle. Moreover, the first and second values are provided in the one clock cycle. This allows the first and second branch instructions to be predicted in the one clock cycle.
In accordance with another embodiment of the present invention, the first branch history table index is generated as a function of a first branch history value stored in a branch history storage device. The second branch history table index is generated as a function of a second branch history value where the second branch history value is formed from the (M-1) least significant bits of the first branch history value. The second branch history table index can be generated without updating the branch history storage device with the first branch prediction.
One advantage of the present invention is that the first and second counter values can be obtained from the branch history table in a single clock cycle.
Another advantage of the present invention is that it enables the prediction of two branch instructions within one clock cycle.
Yet another advantage of the present invention is that it enables prediction of multiple branch instructions contained within a single instruction run provided by an I-cache.