The present disclosure relates to remote direct memory access (RDMA) and, more particularly, to automatically pinning and unpinning virtual pages for RDMA operations.
Remote direct memory access (RDMA) is a network interface card (NIC) feature for copying data from the main memory of one computer to the main memory of another computer. RDMA transfers are generally requested by user-space applications given read and/or write access to memory on a remote computer. Before a transfer can be performed, however, the virtual page to be read and/or written must first be swapped into physical memory, if not already resident, and pinned.
Current state-of-the-art methods for implementing RDMA involve specialized hardware (e.g., an Infiniband adapter) along with a specialized software stack (e.g., Infiniband device drivers and Infiniband Verbs). Generally, the copying of data is performed asynchronously by an RDMA adapter, with little to no involvement by the software executing on the processor. Although the user-space application may initiate the transfer, the data movement is performed by the RDMA adapter.
Since RDMA transfers are asynchronous and performed between virtual or physical address spaces, a user-space application initiating a transfer explicitly pins the memory mapped to the virtual addresses associated with the transfer, thus preventing the operating system from swapping the pages to disk. The application, with the assistance of other software, then provides the virtual-to-physical mappings to the RDMA adapter, so that the RDMA adapter can perform virtual-to-physical address translation (VAT). The application posts an RDMA request, which is specified as a virtual-to-virtual address data copy, to the RDMA adapter.
The RDMA adapter uses its VAT mechanism to directly access the physical memory, where the virtual pages are pinned, and uses the physical addresses to perform the data transfer. The RDMA adapter then asynchronously notifies the user-space application when the transfer is complete, via an interrupt or by posting an item on a completion queue that the application periodically polls.
A problem with this conventional method of transferring data between sets of virtual address spaces is that it breaks the abstract virtualization of memory seen by user-space applications, requiring an application to treat virtual memory as something implemented by a physical memory device. Additionally, an application's virtual address space is generally much larger than the actual physical address space, so the amount of memory that the application can pin is much less than its virtual address space. Since an application may not be able to pin the entirety of memory to be transferred, the application must manage the pinning and unpinning of smaller chunks of memory. The application therefore pins the memory, copies a chunk of data to the pinned memory, posts the RDMA request, waits for asynchronous notification of completion of the RDMA operation, and then unpins the memory. The application then repeats these operations with additional chunks of data until the data transfer is complete.
In conventional systems, the RDMA adapter does not verify with the operating system that memory involved in a RDMA transfer is currently pinned. If the application unpins memory but fails to deregister that memory, and then posts an RDMA operation for that memory, bad or corrupted data may be transferred, which also creates a security risk.
In the above process, there is no coordination between the RDMA adapter and the operating system, as RDMA transfers are coordinated by user-space applications. If an application pins memory inefficiently, such pinning reduces the amount of memory that other applications can pin. Further, regardless of efficiency, for systems with many processes performing RDMA operations, the amount of available memory for pinning is constrained. Thus, numerous issues and risks exist with conventional RDMA operation management.