Field of the Invention
The disclosure generally relates to multi-threaded instruction scheduling, and more specifically to methods and apparatus for ordering thread execution for memory access locality.
Description of the Related Art
Parallel processors have multiple independent cores that enable multiple threads to be executed simultaneously using different hardware resources. SIMD (single instruction, multiple data) architecture processors execute the same instruction on each of the multiple cores where each core processes different input data. MIMD (multiple instruction, multiple data) architecture processors execute different instructions on different cores with different input data supplied to each core. Parallel processors may also be multi-threaded, which enables two or more threads to execute substantially simultaneously using the resources of a single processing core (i.e., the different threads are executed on the core during different clock cycles). Instruction scheduling refers to the technique for determining which threads to execute during the next clock cycle.
Conventional graphics programs identify memory read instructions that access texture map data so that an instruction scheduler within a graphics processor will schedule execution of the memory read instructions as a group, blocking execution of any other instructions until all of the memory read instructions are executed to read texture map data. Execution of the memory read instructions as a group is advantageous because the different accesses typically read previously read texture data or texture data stored in a cache line that has already been loaded from the backing memory. Therefore, memory access bandwidth may be reduced for texture map reads.
When the parallel processors executing the memory read instructions are multi-threaded, two different threads that are executing simultaneously may access different texture maps or different regions of one texture map. If the instruction scheduler interleaves memory read instructions for the two threads, the locality of the memory read instructions is disrupted, and graphics processing performance may suffer. To maintain locality of the texture memory read instructions, the instruction scheduler only allows texture memory read instructions that are identified as being in the same group to execute simultaneously. Scheduling and execution of all other instructions, including texture memory read instructions identified for a different group, are blocked by the instruction scheduler.
One problem with blocking the execution of all other instructions while one group of texture memory read instructions are executed by a first thread is that other threads are not processed to execute other instructions. When threads are scheduled for processing two or more groups of texture memory read instructions one after the other, other threads that process instructions that are not included in one of the groups are not processed for many clock cycles. Those other threads are effectively starved by the threads processing the groups of texture memory read instructions.
Accordingly, what is needed in the art is a system and method for ordering threads for execution so that locality is maintained for texture memory read instructions without starving threads processing instructions that do not read from texture memory.