A conventional high performance superscalar processor typically includes an instruction cache for storing instructions, an instruction buffer for temporarily storing instructions fetched from the instruction cache for execution, a number of execution units for executing sequential instructions, a Branch Processing Unit (BPU) for executing branch instructions, a dispatch unit for dispatching sequential instructions from the instruction buffer to particular execution units, and a completion buffer for temporarily storing instructions that have finished execution, but have not been completed.
As is well known in the art, sequential instructions fetched from the instruction queue are stored within the instruction buffer pending dispatch to the execution units. In contrast, branch instructions fetched from the instruction cache are typically forwarded directly to the branch processing unit for execution. In some cases, the condition register (CR) value upon which a conditional branch depends can be ascertained prior to executing the branch instruction, that is, the branch can be resolved prior to execution. If a branch is resolved prior to execution, instructions at the target address of the branch instruction are fetched and executed by the processor. In addition, any sequential instructions following the branch that have been pre-fetched are discarded. However, the outcome of a branch instruction often cannot be determined prior to executing the branch instruction. When a branch instruction remains unresolved at execution, the branch processing unit utilizes a prediction mechanism, such as a branch history table, to predict which execution path should be taken. In conventional processors, the dispatch of sequential instructions following a branch predicted as taken is halted and instructions from the speculative target instruction stream are fetched during the next processor cycle. If the branch that was predicted as taken is resolved as mispredicted, a mispredict penalty is incurred by the processor due to the time required to restore the sequential execution stream following the branch instruction. Similarly, for the mispredicted branches that have been predicted not-taken, the instructions that were fetched after the branch instruction are discarded and a mispredict penalty is incurred by the processor due to the time required to restore the target execution stream following the branch.
A high performance processor (CPU) achieves high instruction throughput by fetching and dispatching instructions under the assumption that branches are correctly predicted and allows instructions to execute without waiting for the completion of previous instructions. This is commonly known as speculative execution, i.e., executing instructions that may or may not have to be executed. The CPU guesses which path the branch is going to take. This guess may be a very intelligent guess (as in a branch history table) or very simple guess (as in always guess path not taken). Once the guess is made, the CPU starts executing that path. Typically, the processor executes instructions speculatively when it has resources that would otherwise be idle, so that the operation may be done at minimum or no cost. Therefore, in order to enhance performance, some processors speculatively predict the path taken by an unresolved branch instruction. Utilizing the result of the prediction, the fetcher then fetches instructions from the speculative execution path prior to the resolution of the branch, thereby avoiding a stall in the execution pipeline if the branch is resolved as correctly predicted. Thus, if the guess is correct, there are no holes in the instruction fetching or delays in the pipeline and execution continues at full speed. If, however, subsequent events indicate that the branch was wrongly predicted, the processor has to abandon any result that the speculatively executed instructions produced and begin executing the path that should have been taken. The processor “flushes” or throws away the results of these wrongly executed instructions, backs itself up to get a new address, and executes the correct instructions.
Prior art handling of this speculative execution of instructions includes U.S. Pat. No. 5,454,117 which discloses a branch prediction hardware mechanism. The mechanism performs speculative execution based on the branch history information in a table. Similarly, U.S. Pat. No. 5,611,063 discloses a method for tracking allocation of resources within a processor utilizing a resource counter which has two bits set in two possible states corresponding to whether or not the instruction is speculative or when dispatched to an execution unit respectively. Also, Digital Equipment Corporation's Alpha AXP Architecture includes hint bits utilized during its jump instructions. However, as the name implies, these bits are hint only and are often ignored by the jump mechanism.
Most operations can be performed speculatively as long as the processor appears to follow a simple sequential method, such as those in a scalar processor. For some applications, however, speculative operations can be a severe detriment to the performance of the processor. For example, in the case of executing a load instruction after a branch instruction (known as speculative load because the load instruction is executed speculatively without knowing exactly which path of the branch would be taken), if the predicted execution path is incorrect, there is a high delay penalty incurred when the pending speculative load in the instruction stream requests the required data from the system bus. In many applications, the rate of mispredicted branches is high enough that the cost of speculatively accessing the system bus is prohibitively expensive. Furthermore, essential data stored in a data cache may be displaced by some irrelevant data obtained from the system bus because of a wrongful execution of a speculative load instruction caused by misprediction.
A need, therefore, exists for improvements in branch prediction. Presently, most prediction mechanisms operate as hardware prediction. These predicted paths, when mispredicted, tend to corrupt the hardware memory with the results of the speculatively executed instructions. However, certain classes of branches should not be predicted by hardware when the software can tell with a particular degree of certainty which path to take. Consequently, a system and method for software controlled branch prediction mechanism is desired.
It would therefore be desirable to provide a method and system for combining software and hardware branch prediction in a high performance processor. It is further desirable to provide a method and system which allows a developer or compiler of a software code (or program) which has a pre-determined and/or desired path during branch prediction to control the actual path predicted by manipulating the hardware prediction mechanism with a software input.
For many applications, the compiler can often determine how a conditional branch should be predicted by the hardware at run-time. For some applications, the software branch prediction can be highly accurate. The software branch prediction can be very useful for microprocessors that do not have a hardware branch prediction mechanism. It is also useful for improving the hardware branch prediction accuracy for some application, by combining the software branch prediction with the hardware branch prediction mechanism through mechanisms such as an agree/disagree prediction algorithm which works as follows.
Ordinarily the Branch History Table (BHT) stores the information about the branch's outcome. For example, in a 2-bit per entry BHT implementation, each entry indicates whether the associated BHT entry should be predicted taken (1×) or not-taken (Ox). When a branch is executed, if it is found to be taken, the entry is incremented (if it is already “11”, then there is no change). If it is found to be not-taken, the entry is decremented (if it is already “00”, then there is no change).
For agree/disagree prediction, instead of storing the taken/not-taken information in the BHT, the information stored is whether the branch outcome at execution was in agreement with the software branch prediction or not. If the software predicted taken and the branch is actually found to be taken when it is executed, then the branch “agrees” with the software prediction. Similarly, if the software prediction is not-taken and the branch is actually found to be not-taken during execution, then also the branch is considered to have “agreed” with the software prediction. Otherwise, the branch “disagrees” with the software prediction. When a branch is executed, its associated entry in the BHT is updated based on whether the branch “agrees” or “disagrees” with the software prediction. If the branch agrees, then the entry is incremented (no change, if it is already “11”). If the branch disagrees, then the entry is decremented (no change, if it is already “00”). When a branch is fetched, if its associated entry in the BHT is “1x”, then the branch is predicted to agree with the software prediction, that is predict whatever the software says. On the other hand, when a branch is fetched, if its associated entry in the BHT contains “0x”, then the prediction made is opposite of what the software predicted.
The primary advantage of agree/disagree prediction is that, for many applications, it decreases the harmful effects of aliasing in the BHT. That is, if two branches are mapped to the same entry in the BHT, it is highly likely that both will predict “agreed”, if the software prediction accuracy is good (even though, one of the branches prediction may be “taken” and the others may be “not-taken”).
In many architectures, the branch instructions do not have any unused or reserved bit that can be used to provide branch prediction hint by the software. Such hints can communicate to the hardware how the software thinks the branch should be predicted. For these architectures (which includes PowerPC), this invention describes a way of providing software branch prediction hints to the hardware.