Present solutions to scheduling instructions in a multithreaded SIMD architecture are generally performed on an individual basis. This means the basic block with the maximum number of registers sets the number of waves, independent of other basic blocks. If that basic block occurs late in the shader, then other preceding basic blocks cannot use the additional registers, and a basic block that uses a lot of registers cannot chose to schedule with fewer registers at a lower level of performance even if such a choice is warranted based on all basic blocks.
A shader is a program that is used to produce levels of color within an image including, but not limited to, position, hue, saturation, brightness, and contrast of pixels, for example. Shaders render effects on graphics hardware.
Shaders generally utilize parallel processing across a series of registers. The shader programs executed on a multithreaded SIMD machine, such as a graphics processors, need to balance maximum performance for a given shader program against maximum throughput for multiple simultaneous executing waves. An instruction scheduler is a part of a shader compiler targeted to generate code for such a machine. The instruction scheduler chooses the sequence of instructions in order to maximize performance. One tradeoff in instruction schedulers for multithreaded SIMD machines limited by a total number of registers is maximum performance for an individual shader program, with a corresponding typically larger number of registers, versus minimum register usage. That is, allowing maximum throughput for multiple shader programs by allowing more shader programs to execute simultaneously due to a reduction in register usage. In the case of these machines, there is a fixed number of registers that are allocated across multiple shader programs, so that the sum total of the registers required in all executing waves cannot exceed the total number of available registers on the machine.
For these machines, there is a fixed number of registers that are allocated across multiple shader programs. This means that shader programs with individual waves requiring a greater number of registers can execute fewer waves simultaneously. As memory latency for a wave is hidden in the execution of additional waves, restricting the number of waves by increasing the number of registers available to individual waves can require more waiting for memory operations to finish, thereby reducing performance.