1. Field of the Disclosure
The disclosure generally relates to cross-device communications, and more specifically to techniques to reduce redundant copies of data across user and kernel space boundaries in a virtual memory address space.
2. Related Art
Central processing units (CPUs) in computing systems may manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers required between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies.
For example, an exemplary computing system may include a CPU and GPU sharing a common memory address space, with each of the CPU and GPU having a page-locked buffer in kernel-access memory address space. Direct memory access (DMA) controllers may transfer data between the CPU buffer in kernel-access memory address space and the CPU, and between the GPU buffer in kernel-access memory address space and the GPU, without direct intervention of the CPU. However, to transfer data, for example, from the CPU to the GPU, may result in creating a redundant non-page-locked buffer in user-access memory address space, copying data from the CPU buffer to the user-access buffer, and copying data from the user-access buffer to the GPU buffer. Kernel application programming interfaces (APIs) may include functionality to copy data between kernel-access and user-access buffers.
Various proposed schemes to avoid creation of a redundant non-page-locked buffer during data transfer between devices have included customized hardware support of interconnected devices, or collaboration between device vendors during development of device drivers. These schemes introduce additional disadvantages, such as incompatibility with new devices, and standard hardware interfaces or common device drivers that may drive additional cost and complexity into the development of new devices. As such, apparatus and methods to transfer data between devices that minimizes redundant data copies and latency, while utilizing existing kernel APIs provides significant advantages.