A “superscalar” processor has a processor architecture that allows more than one instruction to be executed in a single clock cycle. This may be accomplished using what is referred to as “pipeline processing.” Pipeline processing may refer to overlapping operations by moving data or instructions into a conceptual pipe with all the stages of the pipe processing simultaneously. Accordingly, each instruction is processed as a sequence of stages, each being executable in parallel with stages of other instructions. Typically, an instruction is processed in five stages, namely fetch, decode, dispatch, execute and complete.
In the fetch stage, an instruction(s) is fetched from an instruction cache. In the decode stage, the fetched instruction(s) is decoded. In the dispatch stage, the decoded instruction(s) is dispatched or issued to the appropriate execution unit (e.g., fixed point unit, load/store unit, floating point unit, branch unit) to be executed. For example, if the decoded instruction is a floating point instruction, then the decoded instruction is dispatched to the floating point unit to be executed. In the execute stage, execution units execute the dispatched instructions. Upon executing the received instruction, the execution unit may transmit an indication to a unit, referred to herein as the “completion unit”, indicating the execution of the received instruction. The completion unit may keep track of when these instructions have been “completed.” An instruction may be said to be “completed” when it has been executed and is at a stage where any exception will not cause the reissuance of that instruction. This occurs during the complete stage. Once completed, the results are written into an “architected state” which may refer to a register that stores the written result and indicates that the instruction is completed.
In-order processors may refer to processors that fetch, execute and forward results to functional units in order. That is, instructions are issued and executed in order during stages as described above. In an architecture that employs such in-order processing, each execution pipeline (referring to the pipeline of the execution unit such as for the floating point unit, fixed point unit, load/store unit, branch unit) may typically have the same “completion point” and hence complete in order. An execution pipeline may refer to the stages that occur for the execution unit to execute that instruction. Each execution pipeline may have a different number of stages. For example, a floating point pipeline (pipeline of the floating point unit) may be longer than a load/store pipeline (pipeline of the load/store unit).
A completion point may refer to the point at which it is determined whether an exception occurred in that execution pipeline. An exception may refer to a condition that causes the program or microprocessor to branch to a different instruction stream, or to re-execute an instruction stream that has an error in it. When an exception, e.g., branch mispredict, occurs, the instructions that are “younger” (issued and executed after the instruction causing the exception) are flushed and re-fetched.
Each different execution unit in an in-order processor may have the same completion point since each instruction must be executed in order. When an exception occurs, all of the younger instructions not completed are flushed. These younger instructions may reside in multiple execution pipelines instead of residing in a single execution pipeline. In order to ensure that the instructions are executed in order, each execution pipeline may be required to have the same completion point. Furthermore, each execution pipeline may have the same completion point to simplify updating the architected state by allowing the architected state to be updated at a single cycle instead of being updated at different cycles.
By having the requirement of a single completion point for all the execution pipelines, the completion point may be lower (conceptually) in the pipeline than necessary. As stated above, each execution pipeline may have a different number of stages. However, since the completion point may have to be the same across all the different execution pipelines for an in-order processor, the completion point at each execution pipeline is at the same level as the completion point for the execution unit with the longest pipeline. As a result, the duration of the execute stage becomes unnecessarily longer for those pipelines with fewer stages than the longest execution pipeline.
Therefore, there is a need in the art to eliminate the requirement of having the completion point the same across the different execution pipelines in an in-order processor to improve processor performance.