In recent years, there has been an insatiable desire for faster computer processing data throughputs because cutting-edge computer applications are becoming more and more complex. This complexity commensurately places ever increasing demands on microprocessing systems. The microprocessors in these systems have therefore been designed with hardware functionality intended to speed the execution of instructions.
One example of such functionality is a pipelined architecture. In a pipelined architecture instruction execution overlaps, so even though it might take five clock cycles to execute each instruction, there can be five instructions in various stages of execution simultaneously. That way it looks like one instruction completes every clock cycle.
Additionally, many modern processors have superscalar architectures. In these superscalar architectures, one or more stages of the instruction pipeline may be duplicated. For example, a microprocessor may have multiple instruction decoders, each with its own pipeline, allowing for multiple instruction streams, which means that more than one instruction can complete during each clock cycle.
Techniques of these types, however, may be quite difficult to implement. In particular, pipeline hazards may arise. Pipeline hazards are situations that prevent the next instruction in an instruction stream from executing during its designated clock cycle. In this case, the instruction is said to be stalled. When an instruction is stalled, typically all instructions following the stalled instruction are also stalled. While instructions preceding the stalled instruction can continue executing, no new instructions may be fetched during the stall.
Pipeline hazards, in main, consist of three main types. Structural hazards, data hazards and control hazards. Structural hazards occur when a certain processor resource, such as a portion of memory or a functional unit, is requested by more than one instruction in the pipeline. A data hazard is a result of data dependencies between instructions. For example, a data hazard may arise when two instructions are in the pipeline where one of the instructions needs a result produced by the other instruction. Thus, the execution of the first instruction must be stalled until the completion of the second instruction. Control hazards may arise as the result of the occurrence of a branch instruction. Instructions following the branch instruction must usually be stalled until it is determined which branch is to be taken.
With respect to control hazards, many software techniques, which may be for example implemented through use of a compiler, have been developed to reduce these control hazards. These software solutions have proved less than ideal, as multiple architecture implementations with different pipelining and superscalar choices may make these software implementations difficult to implement.
To that end, hardware or hardware/software solutions have been developed In order to deal with these control hazards. In particular, branch prediction may be utilized to predict if a branch instruction in an instruction pipeline such will be taken and fetching the instructions associated with the prediction. As branch instructions flow through the pipeline, and ultimately execute, the actual outcome of the branches are determined. At that point, if the predictions were found to be correct, the branch instructions are simply completed like all other instructions. In the event that a prediction is found to be incorrect, the instruction fetch logic causes the mispredicted instructions to be discarded and starts refetching instructions along the corrected path.
In other words if the prediction is correct there is no need to insert bubbles in the pipeline and the instruction pipeline remain full. If, however, the branch prediction is wrong a heavy penalty must be paid, as the instruction pipeline must be flushed and instructions associated with the other branch fetched.
As a result of the potential gains in speed which result from accurate branch prediction (and the commensurate penalty imposed by poor branch prediction) hardware branch prediction strategies have been extensively studied. The most well known technique, referred to here as bimodal branch prediction, makes a prediction based on the direction the branch went the last few times it was executed. More recent work has shown that significantly more accurate predictions can be made by utilizing more branch history. One method considers the history of each branch independently and takes advantage of repetitive patterns. Since the histories are independent, we will refer to it as local branch prediction. Another technique uses the combined history of all recent branches in making a prediction. This technique will be referred to as global branch prediction. Each of these different branch prediction strategies has distinct advantages. The bimodal technique works well when each branch is strongly biased in a particular direction. The local technique works well for branches with simple repetitive patterns. The global technique works particularly well when the direction taken by sequentially executed branches is highly correlated.
Some techniques allow different types of branch predictors to be combined. The technique uses multiple branch predictors and selects the one which is performing best for each branch. This approach has been shown to provide more accurate predictions than any one predictor alone, however, this combined approach may require a branch history table (BHT) to be kept for local branch prediction while also requiring that a global branch history be generated for global branch prediction. This requirement may in turn necessitate a relatively large amount of logic.
In pipelined architectures, multiple instructions are fetched simultaneously, thus in one fetch there may be multiple branch instructions which may, in turn, each need to be recorded in a global branch history. In order to do generate this global branch history for these multiple instructions a relatively large number of logic gates may need to be utilized. Not only does utilizing a large number of logic gates consume extra area on a semiconductor device, but additionally, as the logic becomes more complex it is difficult to meet the timing requirements of this logic to be paired with certain. pipelined architectures
Thus, a need exists for efficient systems and methods for the generation of branch history which may be utilized in a pipelined microprocessor architecture and reduce the overhead of branch history generation.