1. Field of the Invention
This invention relates in general to microprocessors, and more particularly, to branch prediction structures in a microprocessor which use branch history information to increase performance of multi-cycle branch prediction structures.
2. Relevant Background
To improve overall performance of modern processors, also called microprocessors, to execute instructions, processors use techniques including pipe lining, super scalar execution, speculative instruction execution, and out-of-order instruction issue to enable multiple instructions to be issued and executed each clock cycle. As used herein the term processor includes complex instruction set computers (CISC), reduced instruction set computers (RISC), and hybrids thereof.
Super scalar processors achieve higher performance by executing many instructions simultaneously at high frequencies. As cycle times for high performance processors decreases, the functions performed by the structures in the microprocessor need to be distributed over multiple cycles. However, information relating to the function must be obtained from the structures every cycle based on updated input per cycle. Accordingly, the coordination of data available at each cycle is critical to the proper operation of the processor. In particular, a processor which utilizes speculative instruction execution must be capable of accurately predicting the instructions to be speculatively executed so that the number of branch misdirections is reduced. In this way, the processor's performance is not adversely affected by excessive branch misdirections.
FIG. 1 illustrates, as an example, five cycles of a microprocessor (shown as cycles 1 to 5), wherein three cycles are required to obtain or "fetch" a set of instructions known as a "fetch bundle". For each fetch bundle of instructions obtained shown in FIG. 1, as an example, one cycle is required to generate the address, and two cycles are required to access the instruction cache (I$) to obtain the actual instructions of the bundle. As shown in FIG. 1, when the address generation step A.sub.1, is complete during cycle 1, the next cycle (cycle 2) can begin generating the address for obtaining fetch bundle "a". This address generation step is shown as A.sub.a. When the address generation step A.sub.a for fetch bundle "a" is complete at the end of cycle 2, on the next cycle (cycle 3) the address generation step for fetch bundle "b" can be commenced. As used herein, "z", "a", and "b" are fetch addresses corresponding to their respective fetch bundle.
The process shown in FIG. 1 assumes that the step of address generation only requires a single cycle to determine the next fetch address. In this sense, FIG. 1 is an ideal situation at reduced cycle times where at the end of cycle 1, the predictive decision as to the next fetch address is complete, and that information can be used at the beginning of cycle 2 for the address generation for fetch bundle "a". At the end of cycle 2, the predictive decision as to the next fetch address for fetch bundle "b" is completed and that information can be used to generate the address for fetch bundle "b" during cycle 3.
However, because of increased frequency of operation and greater complexity of pipelines within a processor, the decision as to the next fetch address can be a complex decision, and may be split over more than one cycle. Accordingly, what is needed is a method and apparatus for increasing the accuracy of multi-cycled, pipeline branch prediction structures.