Field of the Invention
The present invention relates generally to single-instruction, multiple-data (SIMD) processing and, more specifically, to a technique for saving and restoring thread group operating state.
Description of the Related Art
In a conventional SIMD architecture, a parallel processing unit (PPU) may execute multiple groups of threads simultaneously, where each thread within a group executes the same instructions on a different portion of input data. A given thread typically relies on various memory resources while executing the instructions, including local memory, shared memory, registers, and so forth. The state of these memory resources is referred to as the “operating state” of the thread.
Under some circumstances, the PPU may save the operating state of a given group of threads in order to re-allocate the memory resources consumed by those threads to another group of threads. When such a situation occurs, a conventional PPU may simply copy the operating state of each thread in the group of threads to memory. Subsequently, the PPU may then re-launch the group of threads by copying the operating states of each thread from memory back to the corresponding memory resources. With this approach, the PPU is capable of “pausing” a group of threads mid-execution in order to launch another group of threads that consume the same resources as the “paused” group of threads.
However, the above approach is problematic because the speed with which the operating state of a given thread may be copied is dependent on the size of that operating state. When a given group of threads includes a large number of threads, and the operating state of each thread is relatively large, copying the operating state for every thread within that thread group multiple times may require a significant amount of computational resources. Consequently, the overall processing throughput of the PPU may decrease dramatically.
Accordingly, what is needed in the art is a more efficient technique for saving and restoring the operating states associated with different groups of threads in a parallel processing system