A networked virtualization system includes a number of nodes (e.g., hyperconverged systems that integrate compute and storage), in which each node services or “supports” a number of virtual machines and each node has local storage as well as cloud storage or networked storage. Some of the benefits of implementing virtualization include greater utilization of computing resources that would otherwise be wasted, lower operation costs, etc. For example, absent virtualization, only 10% of the CPU or memory of a physical server may be utilized by a single workload while the remaining 90% of resources are wasted. By adding a virtualization layer to the server, several virtual machines running multiple operating systems with different applications may be run simultaneously on the same server by sharing the physical resources of the server. The number of virtual machines that may be supported by a node is dependent on the workloads being handled by the virtual machines supported by the node and the node's resource capacity (e.g., memory, CPU, and scheduling limitations specific to the node). End users may experience degradation in performance when resources of a node that are utilized by virtual machines supported by the node begin to reach their capacity.
A type of resource that is often included in a node of a networked virtualization system is a GPU, or graphics processing unit (sometimes also referred to as a visual processing unit, or VPU). A GPU is a specialized electronic circuit that may be embedded on the motherboard, on the CPU die of a node, or on a graphics board/video card. Through the rapid altering and manipulation of memory, GPUs are designed to accelerate the generation of images in a buffer frame to be output on a display device. GPUs are generally more efficient than CPUs at image processing and manipulating computer graphics because they are capable of processing large blocks of data in parallel. Similar to the benefits afforded by the virtualization of other types of physical resources, the virtualization of physical GPU resources improves the efficiency with which they may be utilized.
GPU resources may be allocated to virtual machines based on the anticipated GPU resource requirements of each virtual machine. For example, suppose a first virtual machine is deployed to perform word processing tasks while a second virtual machine is deployed for playing video games. The second virtual machine will be allocated greater GPU resources than the first virtual machine since the tasks to be performed by the second virtual machine are more graphics-intensive than those to be performed by the first virtual machine. Thus, by way of virtualization, the resources on each physical GPU on a node may be partitioned to support a given number of virtual machines, in which the GPU resources are partitioned based on the anticipated GPU resource requirements of each virtual machine.
Unfortunately, when a user of a virtual machine switches tasks, the GPU resources previously allocated to their virtual machine may no longer suit their needs. For example, if a user decides to take a break from performing word processing tasks to play video games, the user may require additional GPU resources to be allocated to their virtual machine to adequately support the new workload. Failure to allocate these additional resources may result in performance degradation that negatively affects the user's experience (e.g., the user may experience choppy gameplay or the game may crash while the user is playing it). If the user in the above example decides to stop playing the video game and/or decides to resume the word processing tasks, the user's experience is not affected by having more GPU resources allocated to their virtual machine than are necessary for the word processing workload. However, the GPU resources allocated to the virtual machine may be wasted absent their reassignment by a system administrator (e.g., to another virtual machine that has insufficient GPU resources to process its current workload).
Various strategies exist to address the issues of under- and over-provisioning of GPU resources. For example, to address the issue of under provisioning of GPU resources, end users may be required to submit requests to allocate additional GPU resources, which may then be granted or denied by system administrators. However, requiring such requests to be submitted and reviewed by system administrators may be inconvenient and time-consuming for both the end users and system administrators. As an example of a strategy to address the issue of over-provisioning of GPU resources, system administrators may monitor fluctuations in the GPU resources used by virtual machines to determine when to deallocate GPU resources from a virtual machine and reallocate them to another virtual machine. However, such determinations may not always be accurate and may still result in over- or under-provisioning of GPU resources. Moreover, requiring system administrators to monitor the GPU resources (among other types of resources) used by hundreds or even thousands of virtual machines deployed on one or more clusters of nodes may be overwhelming and ultimately not feasible.
Therefore, there is a need to an improved approach for optimizing the allocation of GPU resources to virtual machines supported by nodes in a networked virtualization system.