1. Field of the Invention
The present invention relates to remote direct memory access (RDMA), and more specifically, to RDMA read requests in RDMA networks.
2. Description of the Related Art
Sending and receiving application data over a computer network brings about data copy operations on both sending and receiving hosts. In a multi-user operating system, data typically needs to be copied between application space and operating system kernel. In these systems, the application generates and consumes data while the operating system kernel is responsible for eventually shipping data to a remote application over a network or for receiving remote application data from the network. When using connection-oriented, reliable transport protocols such as TCP, which rely on data retransmissions for guaranteeing data delivery, the copy of the data in kernel space is critical as temporary buffer for re-fetching the data for the retransmits.
Remote Direct Memory Access (RDMA) is a communication paradigm where application data is fetched directly out of a computer's local application memory and directly placed into the application memory of a remote computer. In bypassing the operating system and avoiding intermediate data copies in host memory, RDMA significantly reduces the CPU cost of large data transfers. Complete data copy avoidance (zero-copy) is furthermore achieved if the network interface card (NIC) is able to move networked data directly between the application buffer and NIC buffer using a Direct Memory Access (DMA) engine.
With RDMA, in order to allow the DMA engine of the NIC to move the data between the application buffer and the NIC buffer, the application buffer (memory region) first has to be registered with the NIC. After successful registration, the application buffer, which now can also be referred as a “tagged buffer”, is identified through a unique steering tag (STag). Each RDMA access to a tagged buffer is described by the tuple of STag, tagged offset (TO), and length (len).
An STag identifies an application memory buffer which is already registered with the RNIC Interface (RI) for access. It is a 32-bit identifier including two sub-fields: a consumer provided STag key and an RI provided STag index. The STag key contains the least significant 8 bits of the STag. The STag index contains the 24 most significant bits of the STag. The STag Key is provided by the consumer, the STag index is provided by the RNIC. The consumer can use the STag key as desired. The tagged offset points to the first byte of the memory buffer referenced by the STag which should get accessed.
Two different methods to assign the TO are described. The TO can be the offset in bytes from the beginning of the buffer, or it can be the virtual address of the first byte to be accessed. The length describes the length of the set of bytes to be accessed within the buffer, starting with the first byte defined by TO. The STag together with the tagged offset (TO) and length are used for all subsequent remote data transfers over the network. All local and remote memory accesses require the use of an STag.
For example, a data transfer according to the RDMA Verbs specification works as follows:
1. The application posts RDMA work requests (WRs) defining the data transfers to work queues which are accessible by an RDMA device;
2. An RDMA device processes these WRs asynchronously and in order;
3. The RDMA device notifies the application through a completion event upon completion of the WR processing;
4. The application reaps the work completion (WC) corresponding to the WR from the completion queue.
Also, the work requests are used to define the following:
1. The data transfer operation type (Send, Receive, RDMA Read, RDMA Write);
2. The source buffer for Sends, RDMA Reads and RDMA Writes; and
3. The destination buffer for Receives, RDMA Reads and RDMA Writes.
Depending on the data transfer operation, the source buffer is either local (Send, RDMA Write operations) or remote (RDMA Read operations). Accordingly, the destination buffer is local for RDMA Read and Receive operations and remote for RDMA Write operations. For the Send and RDMA Write data transfer operations, the source buffer (local) can be non-contiguous. Non-contiguous buffers are referred to by means of scatter/gather lists which contain a number of scatter/gather elements. Each such scatter/gather element refers to a single application buffer identified by a STag, the TO and the length.
The following documents provide useful additional details:
RDMA Protocol Verbs Specification (http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf);
RDMA Protocol (http://www.faqs.org/rfcs/rfc5040.html);
Direct Data Placement (http://www.faqs.org/rfcs/rfc5042.html)