Modern processors use relatively long pipelines (e.g., 10-30 stages) to execute instructions. Generally, to keep its pipeline full, a pipelined processor needs to know the next instruction that comes after an indirect jump (e.g., an indirect jump instruction can be operative to change a computing application's control flow to a location designated in its argument, which could be a register or memory location) right after it fetches the indirect jump instruction. Unfortunately, the correct target address of an indirect jump is not known until the indirect jump is executed. It could take tens of cycles to execute the indirect jump after it is fetched. Therefore, to keep its pipeline full, a pipelined processor needs to predict the next instruction that comes after the indirect jump instruction right after the indirect jump is fetched. In other words, when the indirect jump is fetched into the pipeline, the processor needs to predict the target address of the indirect jump instruction. This prediction is not trivial because an indirect jump instruction can have multiple possible target addresses.
For example, a virtual function call that is implemented as an indirect jump instruction can be overridden in many (tens or hundreds of) derived classes. Each of these overriding functions constitutes a possible target address for the indirect jump instruction that implements the virtual function call (and the correct target address is not known when the indirect jump is fetched).
Current practices deploy several mechanisms for predicting the target address of an indirect jump. For example, current pipelined processors use the branch target buffer (BTB) to predict the target of an indirect jump instruction. A BTB is a table that stores information about all taken branches and jumps. This table is organized as a cache and is indexed using the jump address (or some part of it). A standard BTB stores the last seen target of each indirect jump. Therefore, unless the jump only exercises a single target (mono-morphic jump), a BTB-based predictor mis-predicts a jump every time the jump's actual target is different from the last seen target. Alternative implementations of the BTB have been proposed to improve the target prediction accuracy for indirect jumps, with a counter that enables updating the stored target only after a few consecutive mis-predictions. However, the accuracy of a BTB-based predictor can be limited since: (1) only the most recent target can be predicted, (2) only one entry is stored per indirect jump, without any context (history or control-flow path information leading to the jump), (3) the BTB is a set-associative cache and therefore it has compulsory, capacity and conflict misses, and (4) there could be interference between different taken branches and indirect jumps if the BTB is partially tagged to reduce its storage requirements.
Other practices deploy one or more target caches in predicting indirect jumps performed by microprocessors. With current practices, target caches overcome some of the limitations of the BTB by using the principles of two-level branch predictors—i.e., they use branch history information to distinguish between different dynamic instances of an indirect jump. A table named target cache is accessed with a hashing function of the jump address (called the program counter (PC)) and the global branch history register (GHR), for example the XOR function of PC and GHR. Each entry in the target cache contains the last seen target for that particular combination of PC and GHR. The target cache can be tagged or tagless. Larger target caches have better prediction accuracy if they are tagged, because tag matching eliminates interference among different indirect jumps. The target cache can be accessed with different hashing functions involving the jump address (a static value) and some information about the context of the particular dynamic instance of the jump. Usually the context is defined with either branch history or path history or a combination of both.
Another current solution utilizes cascaded predictors which are hybrid predictors that dynamically classify indirect jumps into easy and hard to predict and use different tables with different hardware budgets for each class of jumps. For example, the easy-to-predict jumps can be predicted by the BTB without creating an entry in a more sophisticated table. The underlying idea of the cascaded predictors is that the hybrid predictor can achieve higher accuracy than a monolithic target cache, even with smaller total storage requirements. Multi-stage cascaded predictors further extend this idea by using several tables of increasing complexity (longer branch or path history). Each stage is basically similar to a tagless or tagged target cache, but the update rules (i.e., the rule of not creating an entry in a table unless the jump was mispredicted by all previous tables/stages) allow a more efficient use of the available total storage. A 3-stage cascaded predictor conveniently sized outperforms other configurations and gets most of the benefit of a larger number of stages.
Further, indirect jump prediction can be accomplished by data compression which operatively uses prediction by partial matching (PPM) with a set of Markov predictors of decreasing size, indexed by the result of hashing a decreasing number of bits from previous targets. The Markov predictor is a set of tables where each table entry contains a single target address and bookkeeping bits. The prediction comes from the highest order table that can predict, similarly to a cascaded predictor. The PPM predictor requires significant hardware complexity in the indexing functions, Markov tables and logic to select the predicted target.
Also, current solutions employ indirect target tagged geometric history length (ITTAGE) predictor which operatively uses a set of tables indexed with history lengths that increase according to a geometric progression. The predicted target comes from the table indexed with the longest history that can make a prediction i.e., the table with the longest history that has an entry for that particular indirect jump and branch history. Complex update rules try to create an entry in a table indexed with a longer history only if the less complex tables are unable to predict correctly. Additionally, a usefulness counter and a confidence bit are used to minimize the perturbation introduced by a single occurrence of a jump target.
The virtual program counter (VPC) predictor is a recently proposed predictor that uses the existing conditional branch prediction hardware for indirect jump target prediction. The basic idea is inspired by a compiler optimization called devirtualization, which consists of replacing an indirect call with a sequence of conditional branches testing the most likely targets of the call. The VPC predictor stores multiple targets for each jump in the BTB. The prediction is an iterative process. In each iteration, a virtual PC and a virtual branch history are used to access the conditional branch predictor. At the same time, the BTB is accessed with the virtual PC. If the prediction is “taken”, the predicted target is retrieved from the BTB and the process terminates. If the prediction is “not taken”, another iteration is performed in the next cycle. The maximum number of iterations is limited to 12. The virtual PC is a hash function of the actual PC and the iteration number. The update rules train the conditional branch predictor to predict “taken” for the correct target and introduce the new target replacing the least frequently used target. The main advantage of the VPC predictor is that it does not require expensive and specialized hardware for indirect jump prediction.
Previous approaches only consider a single target for each given program context (i.e., jump address, branch history, or a combination of these three properties), without any mechanism to discern among multiple targets that might have been used under the same context. Unfortunately, in object-oriented programs where indirect jumps have many target addresses, different target addresses can be taken by an indirect jump even for a given program context. Therefore, solely using program context information is not enough to distinguish between the multiple targets used within the same program context.
From the foregoing it is appreciated that there exists a need for systems and methods to ameliorate the shortcomings of existing practices.