Computer processors may support execution of instructions in a pipelined architecture in order to increase throughput. The processing of each instruction may be broken into a sequence of steps, such as instruction fetch, decode, execute, memory access, and write back. Each of these steps can be executed in one or more pipeline stages. Pipelining the instructions in this manner allows the processor to exploit instruction level parallelism. This increases overall processing speed; however, the overall latency of each instruction remains the same. For example, in the case of memory access instructions such as load instructions, long latencies may be involved for retrieving requested data from one or more levels of caches or main memory, which in some cases may be hundreds of clock cycles. Such long latencies for load instructions may introduce long stalls in the instruction pipeline if the instructions are being executed in program order (or “in-order” execution).
Accordingly, some processors may employ out-of-order execution, where instructions may execute and commit (e.g., exit the instruction pipeline after results of the instruction are written back to a register file) out of program order. For example, if a low latency arithmetic instruction enters the pipeline after a load instruction which would incur long latency to commit, then in-order processing would require the low latency arithmetic instruction to stall, waiting for processing of the long latency load instruction to be completed. In this example, in-order processing does not efficiently utilize the processor's resources. Instead, “out-of-order” processing may be implemented, where the low latency arithmetic instruction may be advanced or taken out-of-order to and processed before the processing of the long latency load instruction is completed. Out-of-order processing may be utilized for any number of instructions where they are reordered or processed out of program order in order to improve efficiency of the instruction pipeline. However, out-of-order execution may introduce complexities, for example, in cases where dependencies may exist between instructions that are reordered. Such dependencies may be data dependencies or control dependencies.
For example, a programmatically younger instruction (e.g., the low latency arithmetic instruction) may have one or more common operands with an older instruction (e.g., the long latency load instruction). If the younger instruction were to read or write one or more common operands before the older instruction has updated the operands, then a data hazard is created. Depending on the manner in which the data hazards are created, different forms of data hazards such as read-after-write (RAW), write-after-read (WAR), write-after-write (WAW), etc., are known in the art. Conventional approaches for detecting and preventing data hazards in out-of-order execution involve mechanisms such as scoreboarding, reorder buffers (ROBs), register alias tables (RATs), etc. These approaches rely on protocols such as Tomasulo's algorithm for register renaming and require specialized hardware for their implementation.