As those skilled in the pertinent art are aware, applications may be executed in parallel to increase their performance. “Data parallel” applications carry out the same process concurrently on different data. “Task parallel” applications carry out different processes concurrently on the same data. Hybrid data/task parallel applications exist as well. “Static parallel” applications are applications having a degree of parallelism that can be determined before they execute. In contrast, the parallelism achievable by “dynamic parallel” applications can only be determined as they are executing. Whether the application is data or task parallel, or static or dynamic parallel, it may be executed in a pipeline which is often the case for graphics applications.
A streaming multiprocessor (SM) is a data processor architecture featuring multiple (typically many) pipelined streaming processors, shared memory and a unified instruction fetch/dispatch unit. SMs have many uses; one is as part of a graphics processing unit (GPU), in which SMs can be employed alone or together with other SMs.
GPUs employ two separate mechanisms for launching threads of applications into SMs, a process sometimes called “creating work.” The first of the two mechanisms for creating work is a compute-focused launch mechanism optimized for launching a dynamic parallel application or “grid launching” a static parallel application. The second is a graphics-focused launch mechanism for efficiently launching cooperative thread arrays (CTAs) into graphics pipelines. The graphics-focused launching mechanism is deeply intertwined with functional units inside the SM therefore offers only limited configuration options.