1. Field of the Invention
Embodiments of the present invention relate generally to parallel processing and more specifically to selective thread spawning within a multi-threaded processing system.
2. Description of the Related Art
Multi-processor systems that include one or more single-instruction multiple-data (SIMD) processing units typically implement a mechanism to spawn thread blocks on the SIMD processing units. Thread blocks are conventionally spawned with a one, two, or three-dimensional thread index per thread that uniquely identifies each executing thread. Each thread index is accessible to the associated thread through a mechanism such as a system variable or application programming interface (API) call.
Conventional SIMD programming models use the thread index of a given thread to determine the specific function of the thread. For example, the thread index may be used by a given thread to determine which portion of the overall processing load should be processed by the thread. Furthermore, a given portion of the overall processing load, indicated by the thread index, may or may not need additional processing in subsequent processing steps, as determined by the requirements of the specific algorithm being executed. In conventional processing systems, when no further processing is required for a given thread index, an associated thread is launched nonetheless and terminates after determining that no further processing is required. Spawning threads that perform no additional computation towards a given computational goal reduces overall efficiency. Additionally, each thread that does useful computation must perform overhead computation to determine if the thread should continue executing, further reducing overall system efficiency.
Certain common SIMD algorithms make sparse use of a two-dimensional or three-dimensional index space. These algorithms commonly perform a mapping function within each thread to map a one-dimensional thread index used to spawn each thread to a working thread index of two or three-dimensions that may be used for computation. For certain algorithms, this mapping function consumes a significant portion of the computation required for each pass, thereby reducing the overall system efficiency.
As the foregoing illustrates, what is needed in the art is a technique for more efficiently performing computation within a SIMD multi-threaded processing system.