This relates generally to shared virtual memory implementations and in particular to fine-grain partitioning between a CPU and a GPU.
The computing industry is moving towards a heterogeneous platform architecture consisting of a general purpose CPU along with programmable GPUs attached both as a discrete or integrated device. These CPUs are connected over both coherent and non-coherent interconnects, have different industry standard architectures (ISAs) and may use their own operating systems.
Computing platforms composed of a combination of a general purpose processor (CPU) and a graphics processor (GPU) have become ubiquitous, especially in the client computing space. Today, almost all desktop and notebook platforms ship with one or more CPUs along with an integrated or a discrete CPU. For example, some platforms have a processor paired with an integrated graphics chipset, while the remaining use a discrete graphics processor connected over an interface, such as PCI-Express. Some platforms ship as a combination of a CPU and a GPU. For example, some of these include a more integrated CPU-GPU platform while others include a discrete graphics processor to complement integrated CPU offerings.
These CPU-GPU platforms may provide significant performance boost on non-graphics workloads in image processing, medical imaging, data mining, and other domains. The massively data parallel GPU may be used for getting high throughput on the highly parallel portions of the code.
Existing language mechanisms for executing applications on a CPU-GPU platform tend to only support an offload model in which a kernel (function) is offloaded to the GPU. The arguments to the function are copied to the device. If the arguments include pointer-containing data structures, then the arguments are marshaled and passed to the GPU. Similarly the return value is copied back to the CPU.
These existing models (also referred hereafter as the device models) have a number of disadvantages: 1) they prevent a natural partitioning of an application between the CPU and GPU. An application usually has some throughput oriented parts and some scalar parts. For example a game application will have rendering that is suited for the GPU, but will also have physics and AI that is suited for the CPU. Current models tend to force most of the computation to be offloaded to the GPU.