Many computing systems use a CPU (Central Processing Unit) for general tasks and a GPU (Graphics Processing Unit) for graphics tasks. The CPU is designed to be very flexible to perform a large variety of different computing tasks, while the GPU is designed to be very fast in performing graphics tasks. Graphics tasks tend to be very similar and tend to be very repetitive. As a result, the hardware of the graphics processor may be built with different sections that are each optimized for performing a specific task. These sections may include a render engine, a display engine, a video codec engine, a video quality engine, etc. The GPU may be present in the system on a separate printed circuit board from the system board, on a separate chip on the same system board, as a separate semiconductor die in a package that includes a central processor die, or as a separate core in a multi-core processor.
As GPUs become more common and more powerful, more tasks are being assigned to the GPU to allow the computing power of the GPU to be more fully used. The assignment of tasks is normally controlled by the operating system through the CPU. One group of these tasks includes video encoding, decoding, and transcoding. Without a GPU, video encoding, decoding and transcoding are sometimes very high stress workloads for general purpose processors. As a result, these are commonly performed as GPU-accelerated workloads with the CPU performing some of the work and the GPU accelerating the task by also performing some of the work.
With the wide deployment of cloud infrastructures and the growth of software defined networks (SDN), GPU-accelerated workloads are being moved into VMs (Virtual Machines) as well. However, virtualization technologies may have a big impact on the performance of GPU-accelerated video transcoding and other workloads. As a result, the speed of video transcoding workloads drops significantly on a virtualized environment compared to a native environment.
The GPU, as a device, usually has several command rings in its main memory. The command rings serve as the interface between the GPU and a software graphics driver running on a CPU. In some cases, a mediated pass-through mechanism is used. The mediated pass-through allows partial access to partial device resources, for example memory access, to each VM without hypervisor intervention. In such a case, only privileged operations, such as MMIO (Memory-Mapped Input/Output), GTT (Graphics Translation Tables), etc., are mediated through a software layer. This approach reduces the virtualization overhead load. However, during VM switching, the command rings for the GPU are emptied before a new VM can load its commands into the ring for execution.
In, for example, a GPU-accelerated video transcoding workload, the workload is sent from the CPU to the GPU. Heavy workloads require very little CPU utilization to fully occupy the GPU. System speed is decreased when the GPU command rings are emptied and the GPU must wait before working on the next task. An empty-and-load scheduling mechanism stops any GPU-accelerated task while the ring is being emptied and refilled.