The present invention relates in general to branch prediction in speculative execution of instructions in a software program.
In many computer architectures instructions are executed speculatively to improve the processing speed. Instructions are fetched several cycles before they are executed. When a conditional branch instruction is encountered, to keep the pipeline full a prediction is made about the direction of the branch, that is, whether the branch will be resolved taken or not-taken. Based on the prediction, instructions are fetched and executed from the predicted path after the branch instruction. If the prediction is correct, nothing needs to be done to change the instruction fetching. However, if the prediction is incorrect, instructions fetched after the branch instruction need to be discarded from the machine and new instructions need to be fetched, either from the target path of the branch (if the branch is resolved as taken) or from the sequential path of the branch (if the branch is resolved as not-taken).
Branch prediction algorithms have been implemented to aid in determining which path of a branch will be taken during a series of passes through a branch instruction.
A branch prediction algorithm may combine two prediction schemes known as, global prediction and local prediction. In the global prediction algorithm, the address of the branch instruction being predicted is correlated (by XORing) with the address of the xe2x80x9cpath of executionxe2x80x9d to reach the branch instruction in order to determine the entry in the global branch history table that should be used for predicting the direction of the conditional branch. The xe2x80x9cpath of executionxe2x80x9d is defined (in this example) by a N-bit string of logic zeroes and logic ones representing the last N actual fetch groups (on a mis-prediction or any other redirection of the instruction fetching the path is corrected). A sequential fetch group is represented by a zero and a non-sequential fetch group is represented by a logic one. This string of N-bits is sometimes referred as a path history vector.
The length of the path history vector (N) is related to the number of entries (M), in a Branch History Table (BHT), by the equation N less than =lg (M), where xe2x80x9clgxe2x80x9d stands for a logarithm to the base two. Since the number of BHT entries is limited, the length of path history is also limited. The limited length of the path history vector may cause many branches to be unpredictable. The amount of history needed for predicting a particular branch often depends on the program. Studies have shown that scientific workload often requires longer history for accurate branch prediction. This is especially true for nested loops where the inner loop is unrolled to some extent.
In a program instruction flow, where there are a large number of fetch groups in a loop, the path history vector may not be long enough to capture the history and make highly accurate predictions. There is, therefore, a need for a method to compress the path history vector and improve the prediction in speculative instruction execution.
A path history vector is a shift register of length N that maintains a sequence of binary bits that represent the actual instruction fetch behavior for the last N actual instruction fetches. The path history vector identifies a speculative path of execution with all the correction to the speculation known to the processor at that time. The path of execution is identified by this N-bit vector, one bit per fetch group (a fetch group is a group of instructions fetched in a cycle), for each of the previous N fetch groups. Each bit in the path history vector indicates whether the next group of instructions fetched are from a sequential cache sector (0) or not (1). A path history vector captures this information for the actual path of execution through these sectors. That is, if there is a redirection of instruction fetching (for any reason, such as an interrupt, branch mis-prediction, delayed cache miss detection, table-look-aside buffer (TLB) miss detection, etc.), some of the groups of fetched instructions are discarded and the path history vector is corrected immediately. The path history vector is hashed (by bitwise exclusive ORing (XOR)) with the address of the branch instruction to address an entry into the global history table (which contains a total of 2N entries) to produce a branch direction prediction. The accuracy of prediction depends on how much path history is necessary to determine a most likely action on conditional branches. If certain programs have branch behavior that requires a large path history vector, then the corresponding branch history table may be larger than necessary because all possible table entries may not be used and thus are not of interest. A novel algorithm compresses zeroes in the path history vector to enable such branches to be predicted in a smaller branch history table.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.