The present invention relates to the field of microprocessor architecture. Specifically, the invention relates to a method and apparatus for performing multiple branch predictions per cycle.
Reduced instruction set computers, commonly referred to as RISC processors, are one of the more common computer architectures in use today. In a nutshell, RISC processors rely on simple, low level instructions of the same size. Instruction execution is broken up into various segments and processed in a multistage pipeline. The pipeline is structured such that multiple instructions may be processed at any given instant. For example, a five-stage pipeline may include separate stages for fetching an instruction from memory (instruction fetch stage), decoding the instruction (decode stage), fetching operands the instruction needs (operand fetch stage), executing the instruction (execution stage) and writing the results back to the appropriate register or memory location (write back stage). Since each stage can process an instruction and there are five stages, up to five instructions can be processed at once in such a pipeline.
Thus, such a RISC computer can theoretically achieve performance equivalent to executing one instruction per clock cycle. To achieve higher performance standards, however, more than one instruction needs to be processed in each stage. This higher standard of performance can be achieved by superscalar processors. Superscalar processors are generally based on RISC architecture and incorporate multiple instruction pipelines. For example, one superscalar processor, the Ultrasparc manufactured by SUN Microsystems, includes six separate instruction pipelines: two for floating point calculations/graphics operations, two for integer calculations, one for branch operations and one for memory operations. Theoretically, a superscalar processor having six separate pipelines can process up to six instructions per clock cycle.
One limiting factor as to how many instructions can be processed per clock cycle in RISC, superscalar and other processors that employ instruction pipelines is branch instructions. When a processor executes code containing a branch instruction, the earliest the processor could possibly recognize that the branch is to be taken is at the instruction decode stage. At this point, however, the next instruction has already been fetched and possibly other actions have been taken. Thus, the fetched instruction and other actions must be discarded and a new instruction (the branch target) must be fetched. This problem is compounded because branches are common occurrences. Studies have shown that branch instructions generally occur about as often as once every five to ten instructions.
One way designers have addressed the branch problem is to implement elaborate schemes to predict whether a branch is likely to be taken and then fetch the branch target address as the next instruction rather than the next sequential instruction as appropriate. One such method is as described in Yeh Tse-Yu's Ph.D Dissertation: "Two level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors." A drawback to this method, however, is that only one branch instruction is predicted per fetch cycle. While this may be acceptable for a microprocessor with a limited number of pipelines, as the number of pipelines increases, there is a greater chance of multiple branch instructions being processed in one fetch cycle.