The present disclosure relates to sharing computing resources and, more particularly, to graphic processing unit (GPU) and general purpose graphics processing unit (GPGPU) resource sharing.
A parallel compute-intensive application may leverage GPUs to perform computational aspects of the application. GPUs perform these aspects more quickly and efficiently than a traditional central processing unit (CPU) due to the parallel architecture of GPUs. Existing large-scale compute infrastructures, such as servers, private data centers, and Infrastructure as a Service (IaaS) clouds offer whole GPU, but they do not presently enable applications running on the same server or cluster to share GPUs. This is because there is no software mechanism, even at the level of a single computer, to share a GPU concurrently between two applications. Lack of such a solution leads to decreased utilization, increased costs, and energy wastage, both at the granularity of a single computer as well as a cluster of computers (e.g., in a data center).
In some situations, existing GPUs on a server can only be shared among threads of a single process. GPUs have massive computing resources: for example, some GPUs have upwards of 4,000 processing cores. It is difficult for software developers to write programs that completely utilize the GPU because of the single instruction, multiple data (SIMD) nature of GPUs. Best-effort greedy allocation of resources may lead to resource hogging (e.g., a program (often called a “kernel”) which utilizes 40% of the GPU processing cores but hogs 90% of the GPU memory, a program that uses only 40% of both GPU processing cores and memory but prevents other programs from using the remaining 60% capacity of the resources which remain available, etc.). Moreover, no security or non-interference guarantees are provided between different concurrent or non-concurrent programs when they use the GPU (e.g., subsequent to each other).