As semiconductor technology continues to inch closer to practical limitations in terms of increases in clock speed, architects are increasingly focusing on parallelism in processor architectures to obtain performance improvements. At the chip level, multiple processing cores are often disposed on the same chip, functioning in much the same manner as separate processor chips, or to some extent, as completely separate computers. In addition, even within cores, parallelism is employed through the use of multiple execution units that are specialized to handle certain types of operations. Pipelining is also employed in many instances so that certain operations that may take multiple clock cycles to perform are broken up into stages, enabling other operations to be started prior to completion of earlier operations. Multithreading is also employed to enable multiple instruction streams to be processed in parallel, enabling more overall work to performed in any given clock cycle.
Another area where advances have been made in processor design is that of branch prediction, which attempts to predict, in advance of execution of a conditional branch instruction, whether or not that branch instruction will branch to a different code path or continue along the same code path based upon the result of some comparison performed in association with the branch instruction. Branch prediction may be used, for example, to prefetch instructions from a cache or lower level memory to reduce the latency of loading and executing those instructions when the branch instruction is finally resolved. In addition, in highly pipelined architectures, branch prediction may be used to initiate execution of instructions from a predicted branch before a branch instruction is resolved, such that the results of those instructions can be committed as soon as possible after the branch instruction is resolved.
When a branch is correctly predicted, substantial performance gains may be achieved given that very little latency may exist between executing the branch instruction and the instructions that have been predicted for execution after the branch instruction. On the other hand, when a branch is mispredicted, often the pipeline of an execution has to be flushed and the state of the processor essentially rewound so that the instructions from the correct path can be executed.
As a result, substantial efforts have been made in the art to improve the accuracy of branch predictions and therefore minimize the frequency of branch mispredicts by branch prediction logic. Many branch prediction logic implementations, for example, rely on historical information, and are based upon the assumption that if a branch was taken the last time a branch instruction was executed, a likelihood exists that the branch will be taken the next time that branch instruction is executed. In many implementations, for example, a branch history table is used to store entries associated with particular branch instructions so that when those branch instructions are encountered, a prediction may be made based upon data stored in the associated with such branch instructions.
The implementation of branch prediction logic in a processor, however, presents a number of challenges. For example, improving the accuracy of branch prediction logic often requires the use of more complex logic, which can slow down branch prediction and add to the amount of logic circuitry required to implement the logic. With history-based branch prediction logic, accuracy is often directly proportional to the amount of historical information stored by the logic; however, increasing the storage capacity of a branch history table requires additional logic circuitry. In many applications, there is a desire to minimize the amount of logic circuitry in a processor chip devoted to branch prediction logic, e.g., to reduce power consumption and/or cost, or to free up additional space to implement other functionality.
In addition, it has been found that branch prediction algorithms often don't work well for certain types of program code. Some program code, such as, for example, binary tree searches, exhibit practically random branch characteristics, and a branch decision made during one execution of a branch instruction may provide no insight to what decision will be made the next time the instruction is executed. In addition, in multithreaded environments where multiple threads are concurrently executed in processing core, the limited size of a branch prediction table that is shared by multiple threads can result in historical information being frequently discarded as new branch instructions are encountered, such that the historical information for a particular branch instruction may no longer be in the branch prediction table by the time that branch instruction is later executed.
In fact, it has been found that in some instances branch prediction can actually decrease performance when the percentage of mispredicts rises to a level where the penalties of the mispredicts exceed the latencies that would have otherwise occurred if the processing core waited to resolve branch instructions before attempting to execute the instructions in the proper code path.
Some conventional processor designs have provided an ability to selectively disable branch prediction logic. In addition, some conventional processor designs have provided an ability to save and restore the state of branch prediction logic. History-based branch prediction logic, in particular, tends to improve in accuracy over time as more historical information is collected; however, if multiple independent threads are accessing branch prediction logic with a limited amount of storage, the collection of historical information for one thread may cause historical information for other threads to be discarded. By saving and restoring the state of branch prediction logic, however, the branch prediction logic often can be “primed” for different code sections so that historical information collected for those code sections in the past are more likely to be resident in the branch prediction logic the next time those code sections are executed.
While the ability to selectively disable branch prediction logic and save/restore the state of branch prediction logic can address some of the shortcomings of conventional branch prediction, conventional designs nonetheless are characterized as lacking flexibility to address different situations, particularly in more complex and high performance data processing systems where numerous different types of applications, having vastly different operating characteristics, may be executed on such systems.
For example, many high performance data processing systems utilize virtualization to enable multiple operating systems to be hosted on a common hardware platform under the management of supervisory-level software often referred to as a hypervisor. Each operating system, which runs as a guest of the hypervisor, may in turn host one or more user applications running in separate processes in the operating system environment. A multitude of different applications, running different algorithms with characteristics that are not well suited to generalization from a branch prediction standpoint, may coexist in such a system, making it difficult to provide a branch prediction strategy that works optimally for all scenarios.
Therefore, a significant need continues to exist in the art for a manner of controlling branch prediction logic in a processing core in a flexible and efficient manner.