Many computer systems include a general-purpose processor such as a microprocessor, and additional devices such as components that may be configured into a system or later added in. For example, one such device can provide for graphics functionality that may be implemented by way of an add-in device that typically includes a graphics processing unit (GPU) and a separate memory on an add-in card.
Conventionally, a central processing unit (CPU)/GPU system model can be described as two independent computing complexes connected by an interconnect. The corresponding GPU programming model considers the host (i.e., CPU) and device (i.e., GPU) memory subsystems as two isolated “islands”: code which runs on host (CPU) cannot directly access data located in graphics memory, and code which runs on the graphics device (GPU) cannot access data located in the host memory. Therefore, a programmer must explicitly copy data from host to device and back. As a result, host code and device code cannot exchange general data structures (e.g., lists, trees, etc.) that use pointers. Instead, the current GPU model is limited to data arrays only, so a programmer must use index (offset) instead of pointers, which is inefficient.
Another problem with current graphics cards is that they do not support virtual paging mechanisms. Such virtual paging enables the translation of so-called virtual addresses (VAs) to physical addresses (PAs) of the physical memory. Using virtual paging mechanisms, software is not limited to the actual physical memory, and instead can make reference to a larger virtual address space. Typically, an address translation mechanism such as a translation lookaside buffer (TLB) provides for storage of VA to PA translations. A lack of a virtual paging mechanism in conventional graphics units makes existing GPU programming extremely sensitive to the size of physical memory located on the graphics card.