In the field of computer graphics, a graphics processing unit (GPU) is a specialized circuit that can, e.g., accelerate the generation of images comprising 2D and/or 3D elements for presentation on a display device (e.g., a computer monitor), perform general-purpose parallel computation tasks, and more. A typical GPU performs its graphics operations on data maintained in dedicated video memory that is separate from general system memory. A graphics driver manages the task of moving data between system memory and video memory so that data in the GPU's current working set is available in video memory for use by the GPU.
When a GPU is virtualized (such as in a host system comprising one or more virtual machines (VMs)), the management of video memory becomes more complicated because each VM has its own guest graphics driver, which communicates with a virtual graphics processing unit (VGPU) rather than with a physical GPU. Each VGPU, in turn, communicates with a host graphics driver that interacts with the graphics hardware. In this virtualized scenario, the guest graphics driver for a VM generally does not write graphics data directly to the video memory of the GPU; instead, the guest graphics driver works with the VGPU to write such data to a virtual representation of video memory, which the VGPU may then propagate to physical video memory via the host graphics driver if appropriate (e.g., if space is available).
There are several ways to implement this virtual representation of video memory. According to a first approach, each VGPU can reserve a static pool of guest memory within its corresponding VM for use as a virtual video memory pool. The guest graphics driver of the VM can manage the movement of graphics data between guest application memory and the static guest memory pool as if the static pool were physical video memory. The VGPU can subsequently read the data from the static guest memory pool and pass it to the graphics hardware for storage in video memory. The problem with this approach is that it is inefficient because of the multiple memory read/writes involved, and because each guest graphics driver needs to perform complicated memory management of the static guest memory pool (which ultimately may not reflect the actual placement of data in video memory). Further, reserving a static memory pool in each VM for graphics means that the total amount of guest memory available for other purposes is reduced, even when no applications are making significant use of the GPU.
According to a second approach, the guest graphics driver can provide graphics data directly to the VGPU, without writing the data in a static guest memory pool. The VGPU can then store a local copy of the graphics data in a hypervisor-based virtual video memory pool and can interact with the host graphics driver to propagate that data to physical video memory. The problem with this approach is that it is difficult to appropriately manage the allocation of host memory for the hypervisor-based memory pool. It is possible for each guest graphics driver to implement a limit on the amount of graphics data it sends to the VGPU, but it is undesirable for the guest graphics driver to be in control of this. For instance, if this limit needed to be reduced (due to, e.g., host memory pressure), the hypervisor would not want to rely on the guest graphics driver to implement the reduction, due to timeliness and security reasons.
Further, if no limits are placed on the amount of graphics data that each VM can send to the VGPU layer, the VMs can potentially send a volume of data that exceeds the amount of physical host memory allocated for the hypervisor-based pool, resulting in host swapping and reduced overall system performance. While this performance degradation may be acceptable in certain computing contexts, it is problematic in virtualized environments/systems, which are generally designed to provide performance guarantees to end-users. For example, when a user wishes to power-on a VM on a given host system, the hypervisor of the host system typically checks whether all of the physical resources (e.g., compute, memory, storage, etc.) needed to run the VM at a predefined level of performance are available. If not, the VM is not allowed to be powered on. It is difficult to enforce such performance guarantees in the scenario with no limits above because the hypervisor will not know, a priori, the upper bound of host memory that will be needed at runtime for graphics operations.