This invention relates generally to limited out-of-order execution in an in-order processor, and more particularly to allowing instructions in a shorter execution pipeline to complete execution before older instructions complete execution in a longer execution pipeline in an in-order processor.
For traditional in-order microprocessors (io-μPs), instructions are fetched, dispatched, executed, and retired in a sequential order. Some μPs, including io-μPs, employ instruction pipelining to increase throughput. Individual units that support execution of instructions in micro-architecture of a μP (e.g., fixed-point execution unit (FXU), branch resolution unit (BRU), floating point unit (FPU), etc.) can have different pipeline lengths at the unit level, or not support pipelining at all. Pipelining can increase throughput when sequential instructions are executed that keep the pipeline full, such that operands are ready for each instruction in the execution stage of the pipeline. However, if an FPU-pipelinable instruction is in flight, a subsequent FXU instruction (for example, a branch) must stall at dispatch as long as necessary to ensure in-order completion/retirement. This in turn disrupts the FPU's pipelined execution as subsequent FPU-pipelinable instructions behind the FXU instruction are now stalled prior to dispatch as well. Io-μPs can incur performance degradation when floating-point and fixed-point instructions are both present in an instruction stream, as floating-point instructions take much longer than fixed-point instructions to complete, due in part to a greater number of pipeline stages for floating-point instructions. Typical examples include floating-point instructions within a branch loop, where a branching instruction is executed in either a BRU or a FXU. In this case, the io-μP's pipelined FPU must stop and wait for the BRU or FXU to resolve the branch before resuming pipelined FPU operation.
In out-of-order microprocessors (ooo-μPs), instructions can be fetched, dispatched, executed, and retired in an order different from the sequence in which the instructions are stored. The ooo-μPs queue instructions to wait for operands to be available prior to execution, queuing results, and re-ordering the results upon retiring the instructions. The ooo-μPs often use instruction identifiers or register renaming to support out-of-order execution, which require complex circuitry to manage. Register renaming may also require many additional physical registers, so multiple versions of a register can exist at the same time to avoid false operand dependency. The additional complexity of ooo-μPs over io-μPs may increase instruction execution throughput, but leads to higher manufacturing costs and a greater number of failure modes. Moreover, predictability of instruction dispatching, execution, and retiring order in ooo-μPs can be challenging, which further complicates system analysis and debugging.
It would be desirable to perform limited out-of-order execution in an io-μP. Capitalizing on the sizable depth of a FPU pipeline by allowing certain fixed-point instructions to complete execution before older floating-point instructions would increase io-μP throughput without the high level of complexity involved in an ooo-μP. Moreover, this approach could be applied to other instructions with non-uniform execution pipelines. Accordingly, there is a need in the art for an approach to perform overlapping execution of instructions through non-uniform execution pipelines in an io-μP.