Before an instruction can be actually executed by a processor, a data item may sometimes have to be loaded first from cache or memory—this is known as a load miss. This load miss occurs, for example, when the data item required by the instruction is not yet loaded in internal memory locations in the processor. When such a load miss occurs, some techniques provide that the processor, instead of idly waiting, may speculatively execute other instructions while the load miss is being concurrently resolved by the processor. Clearly, efficiency gain from using these speculative execution techniques is highly dependent on whether instructions to be speculatively executed are correctly predicted at runtime.
If a software module comprises only sequential logic, it is relatively easy to predict what instructions are to follow from a particular instruction. On the other hand, where a software module involves constructs that give rise to branch instructions, since any such branch instruction may comprise two or more possible paths, which of the two or more paths of the branch instruction may actually be executed is not known until after the branch instruction is actually executed (or resolved).
To handle this dilemma, a processor may use a facility known as a branch predictor entry to make a prediction as to which path following a branch instruction should be taken. Specifically, the processor may associate a branch predictor entry with a branch instruction based on a hint provided by a compiler. Such a branch predictor entry may be implemented as a counter (e.g., a saturation counter) that can take on a plurality of values. At runtime, a current value of the counter indicates to the processor to take (for example, by jumping to a non-sequential instruction), or alternatively not to take a particular path (for example, by continuing to execute a sequential instruction that follows the branch instruction). Since software modules that give rise to processor instructions sometimes exhibit predictable behaviors, past execution history of the same branch instruction may be taken into account in setting the value of the counter (or branch predictor entry).
For example, when a load miss is resolved and thus the branch instruction is actually executed, a branch prediction logic may determine whether the prior prediction was correct (i.e., a prediction hit) or not (i.e., a prediction miss). The branch predictor entry associated with the branch instruction may be updated accordingly to favor a path that has actually been taken more frequently in the past. In this manner, future branch predictions may be influenced by past execution history of the branch instruction.
However, generally speaking, the total number of branch predictor entries is limited and not all possible address values of instructions can be mapped to their own separate branch predictor entries. In fact, at runtime the branch prediction logic may often map two or more different branch instructions to the same counter (branch predictor entry).
When two or more different branch instructions share the same counter, a prediction for a correct path of one branch instruction may indicate a prediction for a wrong path of the other branch instruction that shares the counter. As a result, updates of the counter from different branch instructions interfere with each other. This interference may cause extra prediction misses, in addition to those that would be generated by predictable or unpredictable behaviors of the software modules involved. Moreover, the cost of these extra prediction misses can be very high as they may be amplified by some repetitive behaviors of the software modules involved. In particular, wasted speculative executions from one of these extra prediction misses may amount to several hundreds of CPU cycles or more.
For these reasons, the existing techniques for sharing branch predictor entries are not as efficient as would be desired and an improved branch prediction mechanism that reduces adverse impact of interference between branches that share branch predictor entries is needed.