Field of the Invention
The present invention generally relates to multithreaded programming and, more specifically, to a technique grouping instructions into independent strands.
Description of the Related Art
In a multithreaded processing paradigm, a processing unit may execute multiple threads. Those threads may share a hardware resource in order to execute different portions of a multithreaded software program. For example, a first thread could execute using the hardware resource to implement a first portion of the multithreaded program while a second thread waits for access to the hardware resource. When the first thread completes execution, the second thread could then execute a second portion of the multithreaded program using the hardware resource. The hardware resource could be, for example, an execution unit, an arithmetic logic unit, a processing core, or any such hardware resource.
Problems arise with the approach described above when the multithreaded software program involves long-latency instructions, such as load instructions or texture fetch operations. If one of the multiple threads must perform a long-latency instruction, then the other threads are forced to wait until that long-latency instruction completes before gaining access to the shared hardware resource. Returning to the example described above, if the first thread issues a load instruction, the second thread cannot access the hardware resource until after the load instruction completes.
Consequently, during the time spent waiting for the load instruction to complete, the hardware resource cannot perform any useful work. With multithreaded software programs that include numerous long-latency instructions, a large portion of time may be spent waiting for long-latency instructions to complete, and a very small portion of time may be spent executing other instructions. In short, the execution of conventional multithreaded software programs fails to efficiently utilize limited hardware resources.
As the foregoing illustrates, what is needed in the art is a more efficient technique for executing multithreaded software applications.