Field of the Invention
This invention relates to microprocessors, and more particularly, to efficiently reducing the latency and power of data move operations.
Description of the Relevant Art
Microprocessors typically include overlapping pipeline stages and out-of-order execution of instructions. Additionally, microprocessors may support simultaneous multi-threading to increase throughput. These techniques take advantage of instruction level parallelism (ILP) in source code. During each clock cycle, a microprocessor ideally produces useful execution of a maximum number of N instructions per thread for each stage of a pipeline, wherein N is an integer greater than one. However, control dependencies and data dependencies reduce maximum throughput of the microprocessor to below N instructions per cycle.
Conditional control flow instructions perform a determination of which path to take in an instruction stream. Control dependencies caused by conditional control flow instructions serialize instructions at conditional forks and joins along the control flow graph of the source code. Speculative execution of instructions is used to perform parallel execution of instructions despite control dependencies in the source code.
A data dependency occurs when an operand of an instruction depends on a result of an older instruction in program order. Data dependencies may appear either between operands of subsequent instructions in a straight line code segment or between operands of instructions belonging to subsequent loop iterations. In straight line code, read after write (RAW), write after read (WAR) or write after write (WAW) dependencies may be encountered. Register renaming is used to allow parallel execution of instructions despite the WAR and WAW dependencies. However, the true dependency, or RAW dependency, is still intact. Therefore, architectural registers repeatedly used as a destination register and subsequently as a source register cause serialization of instruction execution for associated source code segments.
One example of a common RAW dependency with an architectural register is assigning a base pointer a value stored in a stack pointer at the beginning of subroutines. A related second example is assigning the stack pointer a value stored in the base pointer to deallocate variables at the end of subroutines. These assignments are performed with move operations. Subroutines reduce the cost of developing large, reliable programs. Subroutines are often collected into libraries and used for sharing software. Therefore, the move operations occur frequently during the execution of programs and include the RAW dependency.
In view of the above, efficient methods and mechanisms for efficiently reducing the latency of data move operations are desired.