The present invention relates to computing devices and techniques, and more specifically, but not exclusively, relates to processor architecture for multipass processing of instructions downstream of an instruction that has stalled during normal execution.
As microprocessor designs become increasingly power-and complexity-conscious, future microarchitectures often seek to decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adept at planning useful static instruction-level parallelism, relying solely on the compiler's instruction execution arrangement performs poorly when cache misses occur, because variable latency is usually not well tolerated.
Out-of-order execution is a common strategy that allows the processor to determine how to efficiently order instruction execution. Under this model, the cost of long latency operations can be hidden by the concurrent execution of other instructions. Furthermore, because this selection is dynamic, the ordering of instruction execution can adapt to run-time conditions. With this adaptation ability, out-of-order execution is often used in high-performance microprocessors and frequently improves performance in situations with data cache misses. However, the out-of-order execution mechanisms often replicate, at great expense, much work which can be done effectively at compile time. While aggressive register renaming, a component of out-of-order techniques, eliminates output-and anti-dependences that restrict the motion of instructions, this approach may duplicate much of the effort of compile-time register allocation. Dynamic scheduling typically relies on complex scheduling queues and large instruction windows to find ready instructions, and, in choosing the order of instruction execution, repeats the work of the compile-time scheduler. These mechanisms often incur significant power consumption and add instruction pipeline complexity.
In contrast, a static, in-order execution strategy usually does not incur this expense. Such an approach executes instructions according to the specified compiler plan of execution. While compilers can be successful at planning useful static instruction-level parallelism (ILP) for in-order microarchitectures, the efficient accommodation of unanticipable latencies, like those of memory load instructions, remains a vexing problem. Accordingly, there is further need for contributions to this area of technology.