1. Technical Field
The present invention relates generally to processors and computing systems, and more particularly, to a processor employing speculative execution of instruction streams.
2. Description of the Related Art
Present-day high-speed processors include the capability of simultaneous execution of instructions, speculative execution and loading of instructions and simultaneous operation of various resources within a processor. Simultaneous Multi-Threaded (SMT) processors support simultaneous execution of multiple instruction streams (hardware threads) within a processor core, providing more efficient use of resources within the core. Speculative execution involves predicting streams of program instructions in instruction memory for which execution is likely to be needed, loading data values associated with the predicted streams from data memory and speculatively executing the predicted streams in advance of the actual demand.
The purpose of speculative execution is to maximize the performance of a thread by executing instructions speculatively when the thread would otherwise be idle waiting for an event to occur before the execution path (and potentially data values) for those instructions is completely determined. If the speculation turns out to be correct, the result is improved performance of the thread. If the speculation turns out to be incorrect, the instruction stream typically must be flushed and then results discarded. Thus, speculative execution trades off potentially improved response time and throughput for the possibility of energy wasted on executing incorrectly predicted instruction streams. At a certain level, if speculative processing is not yielding much response time/throughput improvement (i.e., the speculation is not reaching a high level of accurate prediction), at least for a particular thread, the processing energy efficiency is reduced and system efficiency may be degraded severely over non-speculative execution of the threads that are not speculating well. In battery operated systems, it is clear that a high level of poor speculation would cause a condition of wasted potential processing power over the long term. However, in today's power-limited multi-processor and/or multi-threaded systems, a low quality of speculation can also degrade system performance by consuming power that could be more productively used by another hardware thread within a single multi-threaded core or by another core executing another thread.
Certain types of program instruction sequences and programs in general, lend themselves to branch prediction, while others do not. For example, fixed “for” loops of reasonably high iteration counts predictably execute the main body of the loop many times and execute initialization and termination paths only once each. When a speculative processor encounters such a loop, the branch prediction unit can yield high efficiency by predicting that the branch/jump instruction(s) that enter the main body of the loop will be taken each time, although at least on initialization and termination, the prediction will be incorrect. Other program structures, such as “if” decision statements, do not always yield good speculative performance. The worst-case is encountered at a branch probability of 50%, that is, when each path of a program branch becomes equally likely. Generally, without dedicated control informed by knowing the type of program code being executed, speculative operation is typically always enabled when it is used at all.
One technique for reducing the amount of processing power and/or resources wasted on inefficient speculation is disclosed in U.S. Pat. No. 6,792,524 assigned to International Business Machines Corporation, the specification of which is incorporated herein by reference. The accuracy of ongoing speculation is evaluated and the speculation is disabled either for a particular branch or an entire thread if the accuracy is low. However, the technique disclosed in the above-referenced patent still uses resources such as instruction queue space and instruction fetch cycles for the speculative paths not taken. Alternatively, the scheme completely removes the speculative streams from the processing model, potentially losing out on advance processing that could otherwise be performed for a thread.
It is therefore desirable to provide a speculative processing scheme and control mechanism that can reduce the amount of resources and energy wasted on poor speculation, while retaining the advantages of speculation for a thread where speculation is proceeding well.