Modern graphics processing units (GPUs) implement a programmable hardware pipeline, referred to herein as a “graphics pipeline” or “GPU pipeline,” for rendering real-time 3D graphics. Applications invoke high-level graphics APIs, such as Direct3D and OpenGL, to configure this pipeline and to provide shaders, which are programs for performing application-specific graphics or compute operations (e.g., per-vertex processing, per-pixel processing, etc.). Drivers implementing the graphics APIs translate the application-provided API calls and shaders into instructions that are executed by GPU hardware.
By way of example, FIG. 1 is a functional block diagram of a graphics pipeline 100 that is compliant with Direct3D version 10 and OpenGL version 3.3. As shown, graphics pipeline 100 includes an input assembler stage 102, a vertex shader stage 104, a geometry shader stage 106, a stream output stage 108, a rasterizer stage 110, a fragment shader stage 112, and an output/merger stage 114. Stages 102-114 interact with memory resources (e.g., buffers) 116 maintained on the GPU. The general functions performed by each pipeline stage are summarized below:                Input assembler stage 102—provides input data, such as triangles, lines, and points, to the rest of graphics pipeline 100        Vertex shader stage 104—executes application-defined vertex shaders for performing per-vertex computations (e.g., transformation, lighting, etc.); each vertex shader takes as input a single vertex and outputs a single vertex        Geometry shader stage 106—executes application-defined geometry shaders for performing per-primitive computations; each geometry shader takes as input a single, fully-formed primitive (e.g., three vertices for a triangle, two vertices for a line, one vertex for a point) and either discards the primitive or outputs one or more new primitives        Stream output stage 108—streams primitive data from geometry shader stage 106 (or vertex shader stage 104) to one or more output buffers in GPU memory 116; data that is streamed out in this manner can be accessed by the graphics application and/or recirculated back into graphics pipeline 100 as input data        Rasterizer stage 110—converts scene data, which comprises vector information (e.g., primitives), into a raster image comprising pixels; as part of this process, rasterizer stage 110 invokes fragment shader stage 112        Fragment shader stage 112—executes application-defined fragment shaders for performing per-pixel operations (e.g., determining pixel color, pixel depth, etc.); each fragment shader receives as input various types of data pertaining to a particular pixel (e.g., texture data, interpolated per-vertex data, constants, etc.) and outputs color and/or depth values for the pixel        Output merger stage 114—combines the output of the rasterizer/fragment shader stages with the existing contents of a given render target or framebuffer to generate a final pipeline result (e.g., completed frame)        
Of the graphics pipeline stages shown in FIG. 1, some GPU vendors intentionally exclude implementations of geometry shader stage 106 and stream output stage 108 in their GPUs (or certain classes of their GPUs) for various reasons, such as design complexity, cost, power draw, and so on. As a result, graphics APIs that are designed to interoperate with these GPUs also exclude support for these stages. For example, Apple Inc.'s Metal API does not provide any functions for defining/executing geometry shaders or configuring stream output and assumes that these features are absent in the underlying GPU hardware.
The foregoing creates problems in various scenarios, such as when virtualizing a computer system that runs guest applications reliant on geometry shaders/stream output (e.g., Direct3D 10 or OpenGL 3.3 applications) on a host system that uses a host graphics API/driver without support for these features (e.g. Metal). In this scenario, if a guest application of the virtual machine (VM) issues an API call for executing a geometry shader, the hypervisor of the host system cannot simply pass the geometry shader to the host graphics driver for handling since the host graphics driver does not understand this feature (and the host GPU may not natively support it).
One way to work around this problem is for the hypervisor to execute the geometry shader in software via the host system's central processing unit (CPU). However, CPU-based geometry shader execution can significantly degrade rendering performance due to the need for synchronization and data transfers between the CPU and GPU. Thus, this approach is generally impractical for interactive/real-time graphics rendering.