Microprocessors are the brains behind computers and many other types of machines. The demand for faster processors continually outstrip present technology. The demand pressures all aspects of processor architecture to become faster. New generations of processor are now operating at frequencies that make almost any time delay a significant design constraint. As technology evolves, engineers strive to improve upon the performance of processors through the application various techniques to the architecture. One characteristic of a processor architecture is whether it executes instructions sequentially or out of order. An out of order architecture executes instructions in an order different from that in which the code was originally presented to the processor. With an out of order processor, executions units with the processor that otherwise may be idle can be more efficiently utilized. The sequential nature of software code creates data dependencies. A data dependency exists where a later instruction manipulates an operand X and the data at operand X is a result of an earlier instruction. Thus the later instruction has a data dependency on the operand of the earlier instruction.
Another characteristic of a processor architecture is whether instruction processing is pipelined. The processor fetches instructions from memory and sends them into one end of the pipeline in pipelined processing. The pipeline comprises of several stages, each of which perform some function necessary to process the instruction before the instruction proceeds to the next stage. Each stage moves the instruction closer to completion. A pipeline enables the processor to process more than one instruction at a time, thus increasing the instruction processing rate. Dependent instructions can cause a delay in the pipeline because processors typically do not schedule a dependent instruction until the instruction on which the dependent instruction depends has produced the correct result. But some pipelines process instructions speculatively. Speculative execution is where instructions are fetched and executed before pertinent data dependencies are known to be resolved. During speculative execution, the processor predicts how dependencies will be resolved and executes instructions based on the predictions. For example, the processor may predict that all load instructions hit in the cache, and schedule all depending instructions based on the cache hit latency. This form of speculative execution is called data dependency speculation. The processor then verifies that the execution and predictions were correct before retiring the instruction and the results. Speculative execution can also involve predicting what instructions are needed depending on whether a branch is taken. This form of speculation is called control speculation. For example, if two instructions are to be alternatively executed depending on the value of some quantity, then the pipeline has to predict what that value will be or which execution will be executed. The processor then predicts the next instruction to be executed and fetches the predicted instruction before the previous instruction is actually executed.
The verification step can be a challenge. At the end of the pipeline, the results are temporarily stored until all the dependencies have been resolved. The processor then checks for data dependence violations, mispredictions, or exceptions. If there are no execution problems, the instructions are retired and the results committed to the architectural state. But if problems exist, the processor has to perform a correction routine. Existing techniques for handling exceptions in pipeline processing can substantially reduce instruction throughput.
Some processors use a replay mechanism that reintroduces instructions for execution if they have not executed correctly. FIG. 1 is a block diagram of prior art processor 100 having a replay architecture. The processor 100 includes a scheduler 102 coupled to a multiplexor 104 to provide instructions received from an instruction cache to an execution unit 106 for execution. The execution unit 106 may perform data speculation in executing the various instructions received from the multiplexor 104. Processor 100 includes checker units 108, 110, to send a copy of an executed instruction back to the execution unit 106 for re-execution (replay) if the data dependence and other execution speculation is erroneous. Presently, the replay loops have had fixed latency. Processor 100 also includes a replay queue 120 for queuing instructions for replay. Micro-operations (also referred as micro-ops or uops) are taken out of the replay queue after a dynamically decided time delay, for example when needed data is returned from a memory system. When an instruction has executed correctly, the instruction is retired at the retirement unit 112 and the results applied to the architectural state.
Before the present invention, the uops were introduced to the execution unit through the scheduler. The scheduler predicted the dependencies and using the execution latencies of the uops and the fact that some uops replay and stop the scheduler from issuing a new uop in that time slot, the scheduler issued the uops to execution. The uops would keep the relative spacing decided by the scheduler even when they replayed. Since the spacing was decided on predictions and resource availability (e.g., a replay loop time slot) the schedule might not be optimal.