In general, data communications between multiple processors of an electronic device or between electronic devices occurs via a communications path, which may be a wired or wireless communications path. Such data transfers can occur according to a data communications protocol, such as an Ethernet protocol. To transfer data efficiently, a number of techniques have been implemented that allow for direct data transfers. One example includes a Remote Direct Memory Access (RDMA) technique for transferring data directly from a memory of one computing device to a memory of another computing device with limited involvement of the operating systems of either of the computing devices. RDMA permits high-throughput, low-latency networking, which can be used in parallel computer clusters. In general, an electronic device that supports RDMA may include an input DMA and an output DMA to receive data from and send data to other devices.
Typically, when a transmitting device wants to send data to a destination device that has an input DMA, the transmitting device sends a request to the input DMA of the destination device. The input DMA of the destination device can then send an Acknowledgement (ACK) to the transmitting device. When the transmitting device receives the ACK, it transfers data to the input DMA of the destination device, and the input DMA transfers the data into memory with limited involvement of the operating system of the destination device.
In general, the RDMA Consortium defined a suite of protocols at the transport layer that enables cooperating DMA engines at each end of a communication path to move data between memory locations with minimal support from the kernel, and with “zero copy” to intermediate buffers. The RDMA Consortium's specifications are now maintained by the Internet Engineering Task Force (IETF). A Remote Direct Memory Access Protocol (RDMAP) Verbs specification describes the behavior of the protocol off-load hardware and software, defines the semantics of the RDMA services, and specifies how the hardware and software appear to the host software, including both the user and kernel Application Programming Interface (API). The Verbs specification defines an RDMA READ/WRITE operation and a SEND operation that transport data between user space memories or into a receive queue, respectively. Further, the Verbs specification defines Send and Receive Queues (SQ and RQ) and queue pairs to control data transport and Completion Queues (CQs) to signal when an operation is complete. Work Requests (WRs) are converted into Work Queue Elements (WQEs), which are processed in turn by the off-load engine. An asynchronous event (interrupt) is generated when work is complete. Also, data need not be in contiguous memory at either the source or destination, as Scatter Gather Lists (SGLs) can define the physical memory locations of data segments.
In such RDMA transfers, the CPUs, caches, and/or context switches are not used, allowing data transfers to continue in parallel with other system operations. When a processor performs a RDMA read or write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer. Thus, RDMA permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. RDMA can reduce operating system overhead associated with networking, which can squeeze out the capacity to move data across a network, reducing performance, limiting how fast an application can get the data it needs, and restricting the size and scalability of a cluster.
Unfortunately, conventional systems, including complex simulation systems having multiple processors, struggle to generate, process, and render realistic multi-spectral and hyperspectral graphics in real-time, to perform complex modeling calculations, to acquire real-time data, or any combination thereof. While RDMA can be used to leverage processing capabilities associated with multiple processors, network data transfer throughput rates can offset processing gains. Hence, there is a need for systems and methods to enhance data transfer throughput in multi-processor systems.