Many conventional processors implement pipeline techniques to make the processor more efficient. Pipelining is an architecture that enables a long-latency operation to be divided into multiple stages where the output of one stage is the input to the next stage. Pipelining allows a system architect to hide some latency within a system by processing multiple instructions at the same time within the pipeline.
Some pipelines, such as pipelines implemented in texture units, may receive requests (i.e., instructions) from multiple, independent schedulers within the processor, or from a single scheduler executing multiple threads. Compilers may be configured to optimize code in a particular thread by matching the order of instructions with the known throughput of the various pipelines the instructions are configured to be issued to. For example, if a processor includes 32 threads executing simultaneously utilizing 32 arithmetic logic units (ALUs) and 16 double precision units (DPUs), the compiler may order the instructions of the 32 threads to issue any instructions routed to the DPU every other instruction to allow for 2 cycles for the 16 DPUs to process the single instruction from the 32 threads. However, in some processors, many pipelines have a variable throughput such that the compiler cannot properly optimize the code at compile-time. The throughput of a particular pipeline may be variable if the resource accesses some other resource (e.g., a memory access) that has a variable latency. Furthermore, in multi-threaded systems, the compiler may not have an accurate scope at compile time of which threads are executed and in what order. Thus, the compiler cannot accurately track which resources should be available when a particular instruction is issued. Thus, there is a need for addressing this issue and/or other issues associated with the prior art.