Field of the Invention
This invention relates to microprocessors, and more particularly, to efficiently reducing the latency and power of load operations.
Description of the Relevant Art
Microprocessors typically include overlapping pipeline stages and out-of-order execution of instructions. Additionally, microprocessors may support simultaneous multi-threading to increase throughput. These techniques take advantage of instruction level parallelism (ILP) in source code. During each clock cycle, a microprocessor ideally produces useful execution of a maximum number of N instructions per thread for each stage of a pipeline, wherein N is an integer greater than one. However, control dependencies and data dependencies reduce maximum throughput of the microprocessor to below N instructions per cycle.
Speculative execution of instructions is used to perform parallel execution of instructions despite control dependencies in the source code. A data dependency occurs when an operand of an instruction depends on a result of an older instruction in program order. Data dependencies may appear either between operands of subsequent instructions in a straight line code segment or between operands of instructions belonging to subsequent loop iterations. In straight line code, read after write (RAW), write after read (WAR) or write after write (WAW) dependencies may be encountered. Register renaming is used to allow parallel execution of instructions despite the WAR and WAW dependencies. However, the true dependency, or RAW dependency, is still intact. Therefore, architectural registers repeatedly used as a destination register and subsequently as a source register cause serialization of instruction execution for associated source code segments.
One example of a common RAW dependency with an architectural register is a load instruction, or a read operation, attempting to read a memory location that has been modified by an older (in program order) store instruction that has not yet committed its results to the memory location. This type of RAW dependency may occur frequently during program execution. Reading the memory location may include an appreciable latency and reduce processor throughput.
In view of the above, efficient methods and mechanisms for efficiently reducing the latency of load operations are desired.