Current parallel graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data. However, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process, in parallel, as much graphics data as possible throughout the different parts of the graphics pipeline. Parallel graphics processors with single instruction, multiple thread (SIMT) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In an SIMT architecture, groups of parallel threads attempt to execute program instructions synchronously together as often as possible to increase processing efficiency. A general overview of software and hardware for SIMT architectures can be found in Shane Cook, CUDA Programming Chapter 3, pages 37-51 (2013).
A graphics processor typically includes hardware resources having a number of independent engines that may be exposed to a single device via an interface (e.g., PCI (Peripheral Component Interconnect)). Moreover, in a virtualized environment, some processors may include multiple device interfaces. However in such an environment, each virtual device interface continues to expose the same underlying hardware to all the virtual machines.
There are scenarios where access to all engines may not be required to process graphics data. For example, access to a 3D/render engine may be needed, while access to video decoder engines is not. Nonetheless, there is no easy way to provide independent access to only the 3D/render engine to a virtual machine (VM) without obstructing access to the other engines. Thus, the intertwined nature of the engines prevents the ability to partition a device into sub-devices enable independent engine access.