The present invention relates generally to instruction pipelining in processing systems, and in particular to apparatus and method for performing branch predictions using dual global branch history tables and for updating such global branch history tables.
Modern high frequency microprocessors are typically deeply pipelined devices. For efficient instruction execution in the pipelined machines, instructions are fetched and executed speculatively. In other words, a prediction is made about the direction and target of the branch instructions being fetched many cycles before the branch actually gets executed. When the branch is executed, the direction and the target of the branch is determined. If it is determined that the branch has been mispredicted (either its target or its direction), then the branch instruction is completed and all successive instructions are flushed out of the instruction pipeline and new instructions are fetched either from the sequential path of the branch (if the branch is resolved as not taken) or from the target path of the branch (if the branch is resolved as taken).
Many branch prediction techniques have been developed. For example, in Bimodal Branch Prediction, selected bits from a branch address are used as a pointer to a global branch history table. The entries to this table maintain the most recent behavior of a branch based on the number of times a corresponding branch instruction is actually taken versus the number of times that branch instruction is actually not taken. This past behavior of the branch is used to make a prediction of its behavior (taken or not-taken) when the branch instruction is fetched. If the prediction is xe2x80x9ctakenxe2x80x9d, the instructions following the branch are fetched from the target of the branch and forwarded into the instruction pipeline for their execution. Otherwise, the instructions following the branch are fetched from the next sequential address.
Often the outcome of a branch instruction (i.e., xe2x80x9ctakenxe2x80x9d or xe2x80x9cnot-takenxe2x80x9d) correlates with the path of execution (through earlier instructions) to reach the branch instruction. So xe2x80x9crememberingxe2x80x9d the path to reach the branch instruction can be used to make a prediction for such branch instructions. Therefore, in these schemes, a global history register is used which stores history bits (i.e., taken or not-taken information) for a given number of previously executed branch instructions. The history bits tag if the corresponding branch was taken or not taken. In the Gshare prediction scheme, an XOR operation is performed between the global history shift register and selected bits from the address of the branch instruction to obtain the entry number in the global history table. This entry in the global history table represents the saturating counter used to make the prediction for the branch instruction. There are several variations of the scheme on how the global history shift register is combined with the address of the branch instruction to obtain the entry number in the global history table.
Each of the existing schemes is subject to disadvantages, either in prediction accuracy, amount of hardware (e.g. the number of arrays required for the tables), or speed. Thus, the need has arisen for more efficient methods circuits and methods for performing branch prediction.
According to the principles of the present invention, a branch prediction method and apparatus are disclosed. Prediction values are retrieved from local and global branch history tables. A branch operation is selectively predicted using the value from the local branch history table when the value from the local branch history table falls within first predetermined limits. A branch prediction operation is selectively performed using the value from the global branch history table when the value from the global branch history table falls within a second predetermined limit.
In a preferred embodiment, the local branch history table is a bimodal table and the global branch history table is a global branch history table. If the value from the bimodal table shows a strong prediction direction, either branch taken or not taken, then that strong prediction is used to either fetch from the target address of the branch instruction or from the next sequential address from the branch instruction, respectively. If at branch resolution the prediction is correct, then the bimodal table alone is updated and the global prediction table is left unperturbed. Otherwise, if the bimodal table does not show a strong prediction, then the value from the global branch history table is referenced. If the global branch history table indicates a strong direction, either branch taken or branch not taken, respectively, then the global prediction is used to determine whether to fetch instructions following the branch instruction (when the prediction is xe2x80x9ctakenxe2x80x9d) or from the next sequential address of the branch instruction (when the prediction is xe2x80x9cnot takenxe2x80x9d). If neither the bimodal table nor the global branch history table show a strong direction, the stronger of the two predictions from the bimodal and global branch history tables is taken. In this case, both the bimodal and global branch history tables are updated.
The principles of the present invention have substantial advantages over the prior art. Among other things, only two tables are required, instead of the three or four tables required to produce similar prediction accuracy in the prior art. Moreover, these principles can be implemented using a minimal amount of relatively uncomplicated logic circuitry.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.