Remote Direct Memory Access (RDMA) is a technique for efficient movement of data over high-speed transports. RDMA enables a computer to directly place information in another computer's memory with minimal demands on memory bus bandwidth and CPU processing overhead, while preserving memory protection semantics. It facilitates data movement via direct memory access by hardware, yielding faster transfers of data over a network while reducing host CPU overhead.
Different forms of RDMA are known and used (all of which are referred to herein as RDMA), such as but not limited to, VIA (Virtual Interface Architecture), InfiniBand, iWARP and RNIC. In simplistic terms, VIA specifies RDMA capabilities without specifying underlying transport. InfiniBand specifies an underlying transport and a physical layer. RDMA over TCP/IP (transport control protocol/Internet protocol) specifies an RDMA layer that interoperates over a standard TCP/IP transport layer. RDMA over TCP does not specify a physical layer; and works over Ethernet, wide area networks (WAN) or any other network where TCP/IP is used. RNIC is an RDMA-enabled NIC (Network Interface Controller). The RNIC provides support for the RDMA over TCP and can include a combination of TCP offload and RDMA functions in the same network adapter.
RDMA protocols allow a direct access to the application buffers. Hardware interfaces with software using so-called Work Queues (WQ). Work queues are created in pairs, called a Queue Pair (QP), one for send operations (Send Queue) and one for receive operations (Receive Queue). The send work queue (SWQ) holds instructions that cause data to be transferred between one consumer's memory and another consumer's memory, and the receive work queue (RWQ) holds instructions about where to place data that is received from another consumer. The consumer submits a work request, which a Work Queue Element (WQE) to be placed on the appropriate work queue. A channel adapter executes WQEs in the order that they were placed on the work queue.
The abovementioned queues are managed by a so-called verb layer. This layer is a software library residing in the consumer memory space, and providing different RDMA services, like post send and receive request.
Application (wherein the term application encompasses, but is not limited to, user and kernel space; the term “consumer” is also used to denote “application”) posts its buffers for RDMA NIC processing using PostSend/PostRecv verbs. Once the buffers are posted by an application, ownership of the buffers passes to the RDMA NIC. An application is prohibited from accessing the buffers after they have been posted for RDMA processing. Application buffers remain in RDMA NIC possession till RDMA NIC completes their processing (finishes sending the data posted in those buffers, or receives the data destined for those buffers). RDMA NIC provides a way for an application to query for completed requests, herein referred to as a PollCompletion verb.
The prior art has different approaches to the problem of managing application buffers posted via Work Requests (WR) and verb resources (WQs).
Once the RDMA NIC has completed processing the posted WR, the application buffers consumed by this request can be reused by an application and WQEs can be reused by the verb layer.
An application uses PollCompletion verb to query the next completed WR (if any), and given information provided by this verb, the application can manage the buffers consumed by this WR. The decisions how to manage the buffers and when to query for completion of posted requests depend upon the application.
Not every posted request requires report of its completion. It is up to the application to select requests requiring completion report, so-called signaled requests.
There are several completion-reporting mechanisms used in the prior art, two basic ones being described with reference to FIGS. 1A-2B.
Reference is now made to FIGS. 1A and 1B, which illustrate a Write-back Status Approach used in the prior art to report completion of a WR.
As shown in FIG. 1A, a PostSend verb uses a send queue element SQE 10 for a send WR, and a PostReceive verb uses a receive queue element RQE 12 for a receive WR. When a WR is completed, an indication of the WR completion is written in a status field 14 of the WQE (i.e., SQE 10 or RQE 12). A PollCompletion verb is used to query the status field 14 of the WQE to found out if the corresponding WR is completed. Update of the status field in the WQE not only indicates completion of the consumer WR, but also indicates that this WQE can be reused by the verb layer.
It is noted that the PostSend/PostRecv and PollCompletion verbs all operate on the same WQ structure. The same status field 14 of the WQE is used for management of the application layer and the verb layer resources.
As shown in FIG. 1B, the verb layer can reuse a particular WQE only after the application layer has been informed that the status field of that WQE is checked as completed.
The Write-back Status Approach for querying completed requests by the application assumes use of the same data structure for posting new requests, deallocation of completed WQEs, and query on completed requests. In this approach, the application manages its own and verb layer resources.
Reference is now made to FIGS. 2A and 2B, which illustrate a Completion Queue Approach used in the prior art to report completion of a WR. For example, protocols like InfiniBand and iWARP use a completion queue approach. This approach introduces a new term of completion queue (CQ), wherein each entry of such a queue describes a single signaled WR that has been completed.
When the channel adapter completes a WR, a Completion Queue Element (CQE) 16 is placed on the CQ. Each CQE 16 specifies all the information necessary for a work completion, and either contains that information directly or points to other structures, for example, the associated WQE, that contain the information. In this approach, the PollCompletion verb is used to query the CQE 16 to found out if a particular WQE is now available.
As shown in FIG. 2B, the verb layer can reuse a particular WQE only after the application layer has queried the CQE 16 and been informed that the corresponding WR is completed.
This method allows much more flexible mechanism for managing of application resources:                a. sharing of the same completion queue between different WQs        b. use a different data structures to post requests, and poll for completions        
A disadvantage of this approach is that the release of WQEs is done again upon poll for completion. This forces the application protocol from time to time to post signaled WQEs to the CQE 16 allow WQE deallocation, even if their completion is not important from the protocol perspective. Another disadvantage is that the WQ address space must be accessible by the PollCompletion verb, and the CQ and QP must reside in the same memory space. Another disadvantage is the need to synchronize between PollCompletion and PostSend execution. For example, since PostSend consumes WQEs and PollCompletion releases WQEs, the update of the total number of WQEs needs to be synchronized.