Superscalar processors achieve higher performance by executing multiple instructions concurrently using multiple pipelines. However, dependencies between instructions may limit how many instructions may be issued or processed at any given time. As a result, some processors support speculative execution in order to achieve additional performance gains. The objective of speculative processing is to achieve full utilization of the pipeline of the processor, thereby preventing instruction stalls or delays within the processor.
One type of speculation is data speculation. For example, predicting the value of data items may involve observing patterns in data and basing the prediction on those patterns. Another type of speculation is control flow speculation. Control flow speculation predicts the direction in which program control will proceed. For example, branch prediction may be used to predict whether a particular branch will be taken during processing. Generally, in any speculation scheme, if the speculation is incorrect, the instructions that were speculatively processed and/or executed must be re-executed with updated or non-speculative information.
Since speculation allows execution to proceed without waiting for dependency checking to complete, significant performance gains may be achieved if the performance gained from correct speculations exceeds the performance lost due to incorrect speculations (and subsequent re-processing). Accordingly, it is desirable to be able to perform speculative processing in a processor and to provide an efficient recovery mechanism for mispredictions.
SSE (Streaming Single-Instruction-Multiple-Data Extensions) and x87 are extensions of the x86 instruction set. Most instructions in SSE and x87 are dependent upon the x87 control word or the value of the SSE Multimedia Extensions Control and Status Register (MXCSR). Some instructions are known to change the control word during processing and are commonly referred to as control word changing (CWC) instructions. Conventionally, instructions subsequent to and dependent upon a CWC instruction must wait until completion of the CWC instruction so that the new (changed) control word is known. Delaying dependent (subsequent) instructions waiting a control word change reduces performance and increases latency, and therefore, should be avoided. However, within typical program hierarchies, CWC instructions often reside in subroutines that are called at various times and in various places by a running main program. Accordingly, predicting a control word change is problematic since the changed control word depends upon both the instruction calling the CWC instruction and the CWC changing instruction itself.