Field of the Invention
The present invention relates generally to graphics processing units (GPUs) and more specifically to sharing data among GPUs in a multi-GPU computer system.
Description of the Related Art
Current computer systems are typically configured with the hardware capability to support multiple graphics processing units (GPUs) through a compatible bus interface, such as PCI Express. Multiple GPUs in a computer system can share and subdivide a computationally expensive workload such as rendering a 3D scene. To increase processing efficiencies and memory management performance, each GPU is typically capable of engaging in Direct Memory Access (DMA) with memory of the computer system (also referred to herein as the system memory) that has been allocated to an application running on the computer system (i.e., also referred to herein as the application's address space). For example, an application running on the computer system can allocate a memory buffer in its address space and request that a GPU performing a particular task read input data needed to perform the task directly from (and write the output data results of the task directly to) the memory buffer. Such DMA capabilities eliminate an extra copying step that the application would have been required to perform in order to write the input data to a special memory location accessible by the GPU outside the application's address space. Typically referred to as “pinned memory,” the memory buffer is specially allocated by an application to be non-pageable so that it cannot be repurposed by the operating system's virtual memory optimization techniques. Because any paging of the memory buffer by the CPU would not be recognized by the GPU, the GPU could read or write data into the memory buffer at a time when the CPU had repurposed the memory buffer due to paging, thereby corrupting data in the buffer.
Current pinned memory allocation techniques enable application developers to allocate a pinned memory buffer only to a particular process running on a particular GPU that requested its allocation (referred to herein as a “context”). However, an application developer creating multi-GPU aware applications will divide a workload among the multiple GPUs by broadcasting different subsets of the input data to multiple GPUs and desire to gather the output data into a single memory buffer. While each of the processes performing the workload on each of the GPUs may each have their own pinned memory buffer, to date, the application must still copy the results in each pinned memory buffer into a single consolidated memory buffer.
As the foregoing illustrates, what is needed in the art is a technique enabling a newly allocated context of a GPU to directly access pinned memory buffers of a computer system that have been allocated by other contexts from other GPUs in the computer system.