A control hazard, also called branching, is a significant factor in losing efficiency of processor pipelining. When processing branch instructions, conventional processors often do not know where to fetch the next instruction after a branch instruction and may have to wait until the branch instruction finishes, leaving the pipeline behind the branch instruction empty. FIG. 1 shows a conventional pipelining structure, and Table 1 shows pipeline stages with respect to a branch instruction.
TABLE 1Pipeline stages with a branch instruction (branch taken)sequenceiIFIDEXMEMWBi + 1IFstallstallstalltargetIFIDEXMEMtarget + 1IFIDEXtarget + 2IFIDInstr. Addr.ii + 1targettarget + 1tareget + 2target + 3target + 4Instr. Fetchedii + 1targettarget + 1target + 2target + 3Clock cycles1234567
Considering FIG. 1 and Table 1 together, columns in table 1 represent clock cycles in the pipeline, and rows represent instructions in sequence. Instruction address (Inst. Addr.) is the address provided to an instruction memory for fetching instructions, and the output of the instruction memory is then provided to a decoder for decoding the fetched instruction. The pipeline includes instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MEM), and write back (WB). Stall means the pipeline is stopped or empty.
Table 1 shows a branch instruction, as indicated by ‘i’, being fetched at clock cycle ‘1’. Further, ‘i+1’ indicates the instruction following the branch instruction, “target” indicates a branch target instruction of the branch point, and ‘target+1’, ‘target+2’, ‘target+3’, and ‘target+4’ indicate instructions following the branch target instruction in sequence.
As shown in Table 1, at clock cycle ‘2’, the processor fetches the branch instruction ‘i’. At clock cycle ‘3’, the processor fetches instruction ‘i+1’, and decodes the branch instruction ‘i’. Assuming at the end of the decoding stage of the branch instruction, the branch target address is calculated and the branch decision is made. If the branch decision is that of taking the branch then the branch target address is saved as the next address used to fetch the next instruction. At clock cycle ‘4’, the branch target address is fetched and subsequently decoded and executed. From here on, the pipeline processes instructions following the branch target instruction. However, in this scenario, the already-fetched instruction following the branch instruction ‘i+1’ should not be executed, therefore the pipeline stalls in relate to the ‘i+1’ instruction. Thus, under this branch taken successfully scenario, one clock cycle stall is introduced to the pipeline, which may cause significant performance decrease in pipeline operation.