Various types of special-purpose processors, such as graphics processing units (GPUs) for general purpose computing, have been developed to accelerate the processing of specific types of workloads. Architecturally, a GPU has a massively parallel architecture which typically comprises hundreds or thousands of cores that are configured to concurrently execute hundreds or thousands of threads at a given time. This is in contrast to a standard central processing unit (CPU) architecture which typically comprises a few cores and associated cache memory, which are optimized for sequential serial processing and handling a few software threads at a given time.
The processing capabilities of GPU resources are currently being utilized in various applications to accelerate the processing of highly-parallelized computational workloads in various technical fields. In particular, general-purpose computing on GPU (GPGPU) is utilized for high-throughput, accelerated processing of compute kernels for workloads (e.g., vector-based computations, matrix-based computations, etc.) that exhibit data-parallelism. For example, GPUs are used to accelerate data processing in high-performance computing (HPC) and embedded computing systems, for various applications such as financial modeling, scientific research, machine learning, data mining, video data transcoding, image analysis, image recognition, virus pattern matching, augmented reality, encryption/decryption, weather forecasting, big data comparisons, and other applications with computational workloads that have an inherently parallel nature.
Due to the high-throughput and low energy consumption per operation exhibited by GPUs, it is anticipated that GPU-as-a-Service (GPUaaS) will become mainstream in the near future, wherein cloud-based systems will implement GPU powered blades for various types of processing. In current server-based implementations, individual GPU devices are typically allocated to individual users on a dedicated basis, which can result in extremely low utilization of GPU devices. For example, in such systems, an IT manager can only statically allocate GPU devices to users on an individual basis, whereby GPU devices are not shared among the users. As a consequence, even when a given user is not using his/her dedicated GPU, other users cannot utilize the GPU. Due to the high cost of acquisition, the implementation of a large number of GPU devices for a computing platform can be a major investment. In almost all fields that utilize GPU resources, ranging from research and development, to production, GPU device utilization is typically very low. Therefore, to reduce the acquisition and operational costs associated with GPU resources, it would be highly desirable to implement a system that can manage GPU resources in a way that allows multiple users to share GPU resources without experiencing performance degradation.