1. Field of the Invention
The invention relates to data transmissions. More specifically, data transmissions using Remote Direct Memory Access (RDMA).
2. Description of Related Art
RDMA is a communication paradigm whereby application data is fetched directly out of a computer's local application memory and placed directly into the application memory of a remote computer. In bypassing the operating system and avoiding intermediate data copies in host memory, RDMA significantly reduces the CPU cost of large data transfers. Complete data copy avoidance (zero-copy) is achieved if the network interface controller (NIC) is able to move networked data directly between the application (buffer) memory and NIC buffer using a DMA engine.
PCT Application No. WO 2011/135515 A1 describes a method for data transmission on a device without intermediate buffering. A request is received to transmit data from the device to a second device over a network. The data from application memory is formatted for transmitting to the second device. A length of send queue is retrieved. The data are transmitted from the device to the second device without intermediate buffering. The length of the send queue is compared to the expected send queue length. If the length of the send queue is at least equal to and/or less than the expected send queue length, a completion element is generated.
U.S. Patent Application No. US 2011/0078410 A1 describes a method of and a system for multiple party communications in a processing system including multiple processing subsystems. Each of the processing subsystems includes a central processing unit and one or more network adapters for connecting each processing subsystem to the other processing subsystems. A multitude of nodes is established or created, and each of these nodes is associated with one of the processing subsystems. Here, pipelined communication using RDMA among three nodes may be involved, wherein the first node breaks up a large communication into multiple parts and sends these parts one after the other to the second node using RDMA, and the second node in turn absorbs and forwards each of these parts to a third node before all parts of the communication arrive from the first node.
Moreover, in a published paper, Matt Welsh and David Culler disclosed Jaguar: Enabling efficient communication and I/O in Java. They postulate that implementing efficient communication and I/O mechanisms in Java requires both fast access to low-level system resources (such as network and raw disk interfaces) and direct manipulation of memory regions external to the Java heap (such as communication and I/O buffers). Java native methods are too expensive to perform these operations and raise serious protection concerns. In Jaguar, a mechanism is provided that uses Java applications with efficient access to system resources by retaining the protection of the Java environment. This is accomplished through compile-time translation of certain Java byte codes to inlined machine code segments. In this paper, the use of Jaguar through a Java interface to the VIA fast communication layer is demonstrated, which achieves nearly identical performance to that of C and pre-serialized objects, a mechanism which reduces the costs of Java object serialization.
In particular, RDMA may be useful for data centers running cloud computing workload, because of a big buzz about low latency, high throughput and power consumption. Many cloud computing applications are written in Java and other interpreted languages, like C#. But high-performance RDMA support for interpreted languages is difficult and costly. In this regard, RDMA data structures need to be converted either into C data structures or into a hardware-dependent representation. In sum, serialization of RDMA function calls is expensive in interpreted languages like Java or C#.
In particular, the RDMA function calls may be provided as a two-dimensional list of work descriptors which has to be generated and serialized which is slow in Java. This needs to be done either when passing the two-dimensional list of work descriptors to a C library or when writing the list to the network interface controller.