Field of the Invention
The present invention generally relates to compute applications, and, more specifically, to simultaneously executing plural compute applications on a graphics processing unit (GPU) having multiple cores.
Description of the Related Art
Current methods for sharing hardware resources available on a GPU between plural compute applications require the compute applications to communicate with a central resource manager. In particular, the central resource manager receives workloads from the compute applications and transmits the workloads to the GPU for execution. In turn, the GPU controls synchronization between the compute applications and permits only threads of a single compute application to be executed by the GPU at a time. These threads may become unresponsive during synchronization operations, which makes it difficult for software developers to maintain overall system responsiveness and proper load balancing. Moreover, the GPU is required to execute a context switch for every transition into executing threads of a different compute application, which results in GPU idle time. This problem is exacerbated by the typically large sets of state that are associated with compute applications and must be stored (and subsequently reloaded) when a context switch occurs.
Accordingly, what is needed in the art is a more effective technique for executing plural compute applications on a GPU.