1. Field of the Invention
This invention relates to computing systems, and more particularly, to efficient scheduling of speculative load instructions.
2. Description of the Relevant Art
The pipeline depth of modern microprocessors continues to increase in order to support higher clock frequencies and increased microarchitectural complexity. Despite improved device speed, higher clock frequencies of next-generation processors limit the levels of logic to fit within a single clock cycle. The deep pipelining trend has made it advantageous to predict the events that may happen in the pipe stages ahead. One example of this technique is latency speculation between an instruction and a younger (in program order) dependent instruction. These younger dependent instructions may be picked for out-of-order (o-o-o) issue and execution prior to a broadcast of the results of a corresponding older (in program order) instruction. Additionally, the deep pipelining trend increases a latency to receive and use load (read) operation result data.
One example of the above instruction dependency and latency speculation is a load-to-load dependency. A younger (in program order) load instruction may be dependent on an older (in program order) load instruction. The older load instruction that produces the result data may be referred to as the producing load instruction. The younger instruction dependent on the result data of the producing load instruction may be referred to as the consuming load instruction. When the target register of an older producing load (read) instruction is also an address register (source operand) of a younger consuming load instruction, the occurrence may be referred to as pointer chasing. Linked list traversals typically include frequent pointer chasing.
For load (read) instructions, the requested data may be retrieved from a cache line within a data cache. Alternatively, the requested data may be retrieved from a store queue, such as in the case when control logic determines whether a load-store dependency exists. Data forwarding of load results to dependent instructions may occur by sending the retrieved data to a reservation station and/or a register file. Afterward, the data may be sent to one or more execution units corresponding to the younger dependent instructions. The data forwarding incurs an appreciable delay. The traversal of one or more linked lists within a software application accumulates this delay and may reduce performance. The latency for receiving and using load instruction result data may vary depending on instruction order within the computer program. However, the shorter latency cases may not be taken advantage of within a pipeline despite a high frequency of occurrence of the shorter latency cases. The traversal of a linked list is one case that may allow an opportunity to decrease the latency to use load instruction result data.
In view of the above, methods and mechanisms for efficient scheduling of speculative load instructions are desired.