RDMA facilitates the transfer of data from the memory of one computer to the memory of another computer, typically utilizing operating system kernel-bypass input/output (I/O) and zero-copy data movement capabilities to achive high throughput, low latency networking. RDMA-based communication between different devices is an expanding and increasingly important component of many networking technologies currently being developed and deployed. RDMA-based communication can provide significant advantages over more conventional networking technologies, primarily due to the fact that RDMA permits resource costs associated with network-based communication processing to be offloaded from the primary central processing unit (CPU) of a device to a network interface card. This can remove a key bottleneck in communications processing. RDMA platforms can be exploited with software-implemented procedures using Application Program Interfaces (APIs) based on the User-Level Direct Access Programming Library (uDAPL) standard.
A key requirement of any such software-based procedure using RDMA is a requirement to “pin” regions of memory that are used for buffering data conveyed across a network. Perhaps the simplest technique for satisfying this requirement is to reserve a large area of memory for communications processing up front and to pin it for the lifetime of the particular software procedure or program. Since pinned memory must be backed by physical memory storage on the machine or device, however, and because this memory area may need to be quite large if the communications requirements of the program or procedure are extensive, this pinning of large portions of the memory can adversely impact the resources usage of the device. This, in turn, can result in an increased total resource cost of ownership for a user or other resource “customer.” Moreover, if the program or procedure only has extensive communication processing requirements during certain peak workloads, then a significant amount of the memory is likely to be wasted during non-peak periods. Accordingly, it is advantageous to dynamically manage the memory used for RDMA using known dynamic memory management algorithms.
Attempting to dynamically manage memory used for RDMA communications, however, adds a new layer complexity to the typical dynamic memory management scheme. Specifically, a region of memory reserved by an operating system (OS) according to a dynamic memory management algorithm must be both registered and pinned before it can be used for RDMA communications. This registration is performed at the process level. Thus, in a multi-process system, if a block of memory that has been returned to the heap is subsequently re-reserved by another process, then that process also must register the block of memory before the block of memory can be used by the process. Releasing blocks of memory from the heap back to the OS becomes even more complex because each registered process also must perform a deregistration procedure of the block of memory. Typically, therefore, the dynamic memory management algorithm would need to signal each process that had previously registered the block of memory in a synchronous fashion in order to have each process release its registration when the dynamic memory management algorithm dictated release of the block of memory to the OS.
A simpler alternative would be to register individual blocks of memory as each is obtained from the heap and to deregister the same blocks of memory as each is returned, rather than registering larger blocks of memory within the heap as the blocks are allocated from the OS. This approach, however, has undesirable aspects in that the registration and deregistration processes can be performance-intensive, and registering blocks of memory as each is allocated from the heap rather than registering larger blocks of memory in the heap as each is allocated from the OS requires that many more individual registrations and de-registrations be performed. Moreover, in the specific context of uDAPL processes, because uDAPL requires the allocation of a separate memory block to store the uDAPL memory regions (MRs), this approach can greatly increase the memory overhead of the heap algorithm.