1. Field of the Invention
This invention relates to computer hardware and, more particularly, to branch prediction techniques.
2. Description of the Related Art
In many software applications, it may be necessary to iterate the same software steps a number of times to accomplish a particular program task. For example, a large array of variables may need to be initialized to a reset value before use, or a particular mathematical operation may need to be performed on each element of such an array. Rather than explicitly code each initialization or mathematical operation for each data element of such an array, a programmer may choose to employ a loop construct in a high-level programming language such as C or C++ to perform the operation over the whole data array iteratively, thereby potentially yielding more compact and efficient code.
For example, a programmer may code the operation to be performed on each data element in an abstract way, using an index variable to reference a particular data element. The programmer may then embed the abstract operation in an iterative loop, such as a C/C++ for-loop, to be executed a specific number of times, which number may also be referred to as the iteration count. The iteration count may for example be equal to the number of data elements to be processed. The iterative loop may also specify a counter variable to represent the current iteration number. Finally, the programmer may define a mapping from the current iteration number to the index variable used to reference a particular data element. For example, in a one-dimensional data array whose elements are indexed by integers, the current iteration number may map directly to the array index.
While executing, a loop construct may test the current iteration number against the iteration count to determine whether the end of the loop has been reached. For example, if the current iteration number is less than the iteration count, the loop may continue executing, while if the current iteration number is equal to the iteration count, the loop may terminate. The continued execution of the loop may thus be conditional, depending on the status of the current iteration number relative to the iteration count.
A loop construct coded in a high-level programming language may be translated into instructions of an instruction set architecture (ISA) that may be then executed by a microprocessor or system implementing that ISA. In some such translations, conditional branch instructions defined in the ISA may be used to implement the conditional behavior of loop execution. For example, various instructions may be used to test the value of the current iteration number, and a conditional branch instruction based on the results of the test may be used to branch to the beginning of the loop code sequence for another iteration, if necessary, or to execute code from another location if the loop has terminated.
A given microprocessor implementation may attempt to fetch instructions well in advance of their eventual execution, in order to allow for performance-improving features such as early decoding of instructions and instruction rescheduling or optimization based on run-time data availability, for example. However, a conditional branch instruction may present more than one potential fetch path, depending upon whether the branch is ultimately taken or not taken. Further, conditional branch outcome may not be known until the conditional branch actually executes. In order not to stall instruction fetching until a conditional branch's outcome is known, a microprocessor may implement a branch prediction scheme to predict the outcome of a given conditional branch and then speculatively fetch and execute instructions along the predicted path.
Branch prediction schemes may improve microprocessor performance to the extent that predictions are correct, but incorrect predictions may require that any speculatively executed instruction along the mispredicted path be discarded and the correct instruction path fetched and executed. Thus, branch prediction accuracy may substantially impact overall microprocessor performance. Conditional branches implementing loops may represent a substantial fraction of the total number of conditional branches in a given application program, but existing branch prediction schemes may not accurately predict the behavior of such loop conditional branches, thus potentially limiting overall branch prediction accuracy and microprocessor performance.