Branching (including conditional branching and unconditional branching) causes a change of flow. The change of flow includes flushing pipeline stages of a processor. The penalty associated with the flushing is responsive to the depth of the pipeline. In order to reduce this penalty many processors perform branch prediction.
Branch prediction attempts to estimate whether a condition associated with a conditional branch will be fulfilled or not. In case of unconditional branch instructions the prediction is very simple—the outcome of the prediction is included in the unconditional branch instruction. A branch prediction unit generates predicted target addresses. A predicted target address can be a speculative target address if it is associated with an unresolved conditional branch instruction.
Instructions that are located at the speculative target address (and addresses that follow the speculative target address) are fetched to the pipeline stages. The correctness of the branch prediction (correctness of the speculative target address) is checked (resolved) at the last pipelined stages—after multiple instructions were already processed by one or more pipeline stages.
A conditional branch instruction can be responsive to one or more conditions. Multiple conditional branches can be dependent upon the same condition. After the condition is resolved its state (for example—true or false) can be flagged by a condition flag.
A pipeline stage that stores (and additionally or alternatively processes) a branch request can send to the fetch unit an instruction fetch request. If, at a certain point in time, multiple pipeline stages store branch instructions then the fetch unit can receive multiple instruction fetch requests. Some of these instruction fetch requests can be responsive to unconditional branch instructions while other instruction fetch requests can be responsive to conditional fetch requests.
If multiple conditional branch requests are associated with the same condition then a single condition flag can be accessed by multiple hardware components. These multiple accesses can cause fan-out problems and result in a reduction of the operational frequency of the processor.
Reducing the number of accesses to the condition flag can be implemented by stalling the propagation of all but a single branch instruction through the pipeline stages but reduces the throughput of the processor.
The following code can be executed by introducing multiple stalls between its commands, especially multiple (for example—five) stalls are introduced between code lines I3 and I4, multiple stalls are introduced between code line I4 and I5, and multiple stalls are introduced between code line I5 and I6.    I1 move (R4),D0 multiply D4,D5,D1    I2 cmpeq D0,D1 multiply D5,D6,D2    I3 jt_I7 cmpeq D2,D3 mutiply D6,D7,D3    I4 jf_I9 cmpeq D6,D7 add D2,D3,D4    I5 jt_I1 move (R4),D0    I6 jmp_I2 move (R5),D1    I7 add D1,D2,D3    I8 move (R5),D9 inc D1    I9 move (R6),D8 inc D2
Alternatively, when this code propagates through pipelined stages four instruction fetch requests can be sent to fetch unit.