Processing systems such as central processing units (CPUs), graphical processing units (GPUs), and accelerated processing units (APUs) implement instruction pipelines to increase the number of instructions that can be executed in a particular time interval. A typical pipeline includes several pipeline stages such as an instruction fetch stage, a decode stage, an execution stage, a memory access stage, and a write-back stage. Instruction scheduling algorithms can be used to improve the overall pipeline throughput by optimizing the order or schedule of execution of instructions. For example, out-of-order instruction scheduling can be used to schedule instructions for execution in the pipeline in an order that is different than the program order of the instructions. Out-of-order instruction scheduling algorithms must account for potential data hazards related to dependencies between the different instructions. For example, a first instruction that writes a value to a register that is later read by a second instruction should generally be performed before the second instruction.
A conventional scheduler maintains a queue of entries that can be picked for scheduling. Each entry becomes ready and eligible to be picked for execution once all of its source registers are ready, e.g., the source registers are not waiting to be written by an older instruction. In the case of a dependent (child) instruction that accesses one or more source registers that are written by an older (parent) instruction, source registers for the child instruction are marked as ready in response to the parent instruction being picked for execution. For example, a picker may broadcast the read address of a RAM location that includes information identifying destination registers of the entry that has been picked for execution. The information may be referred to as a tag and typically includes the physical register number associated with the destination register of the picked instruction. The tag can be read out of the RAM location and compared to information identifying the source registers of entries in the queue. A match indicates that the corresponding source register is ready and the child instruction can be marked as ready and eligible when all of its source registers are marked as ready. However, reading out the tag from the RAM location takes time, which may impact critical path timing. Moreover, schedulers that use tags such as physical register numbers (PRN) that identify physical register entries may need to allocate a physical register to an instruction to track dependencies of the instruction even if the instruction does not use the physical register. This unnecessarily consumes the physical register and reduces the number of physical registers available for other instructions.
While the disclosed subject matter may be modified and may take alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the disclosed subject matter to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the appended claims.