1. Field of the Invention
The embodiments herein relate to computer processors, and more specifically, to a computer processor and data processing method incorporating an instruction pipeline with conditional branch direction prediction for fast access to branch target instructions.
2. Description of the Related Art
As mentioned above, computer processors often incorporate an instruction pipeline to increase instruction throughput by dividing the processor into separate stages. In an exemplary five-stage pipelined processor, the stages may include an instruction fetch stage (also referred to herein as instruction cache read (ICRD) stage), an instruction decode stage, an instruction execution stage, a memory access stage and a write-back stage. During the instruction fetch stage, an instruction comprising a given number of bits (e.g., 32 bits) is fetched from a specific address of an instruction cache (I-cache). During the instruction decode stage, the instruction bits are passed through combinational logic to produce control signals. During the instruction execution stage, the control signals are executed. During the memory access stage, data, if any, required for execution of the instruction is read (e.g., from a data cache (D-cache)). During the write-back stage, the results of executing the instruction are written into a register file. Ideally, in such a five-stage pipeline processor, each instruction enters the pipeline and spends one clock cycle at each stage, so that a single instruction takes five cycles to pass through the pipeline. Additionally, when one instruction is fetched and moved to the decode stage, the next instruction in the sequence can be fetched and so on. However, branch instructions and, particularly, conditional branch instructions introduce a temporary uncertainty into the pipeline which can result in one or more stages remaining idle in a given cycle and, thereby causing delay.
Specifically, when a sequence of instructions being processed through a pipeline includes a conditional branch instruction, the fetch address (i.e., the address in the I-cache) of the instruction following the conditional branch instruction may not be known until after the conditional branch instruction is actually decoded and executed. Waiting to fetch the next instruction in the sequence until after decoding and execution of the conditional branch instruction can result in a large stall (i.e., a stall of multiple cycles) and, thereby a missed opportunity to fetch other instructions in the sequence and avoid delay.
Consequently, modern computer processors often incorporate an instruction steering stage, wherein a direction predictor is used to determine whether an instruction is a conditional branch instruction and, if so, whether the conditional branch instruction will be not taken such that the next instruction in the sequence will be found in the next sequential address in the I-cache or whether that conditional branch instruction will be taken such that the next instruction in the sequence will be found at some other address in the I-cache. Furthermore, the instruction steering stage will also often incorporate a branch instruction target address cache (BTAC) to further predict the target address of the conditional branch instruction when that conditional branch instruction is predicted taken. The accuracy and timing of instruction steering, including branch instruction direction prediction and, if applicable, target address prediction, are extremely important to avoid performance penalties as a result of a delayed and/or an incorrect instruction fetch. While various instruction steering techniques are known, there is still need for improvement.