1. Technical Field
The present invention relates generally to remote data memory access (RDMA) completion and retransmit systems, and more particularly relates to an implementation of RDMA completion and retransmit that maintains ordering between the ResponseOut and RequestOut channels.
2. Related Art
RDMA (remote data memory access) is a network interface card (NIC) feature that lets one computer directly place information into the memory of another computer. The technology reduces latency by minimizing demands on bandwidth and processing overhead. Traditional hardware and software architecture imposes a significant load on a server's CPU and memory because data must be copied between the kernel and application. Memory bottlenecks become more severe as connection speeds exceed the processing power and memory bandwidth of servers.
RDMA gets around this by implementing a reliable transport protocol in hardware on the RNIC (RDMA network interface card) and by supporting zero-copy networking with kernel bypass. Zero-copy networking lets the RNIC transfer data directly to or from application memory, eliminating the need to copy data between application memory and the kernel.
Kernel bypass lets applications issue commands to the RNIC without having to execute a kernel call. The RDMA request is issued from user space to the local RNIC and over the network to the remote RNIC without requiring any kernel involvement. This reduces the number of context switches between kernel space and user space while handling network traffic.
RDMA consumers (applications) uses message semantics to communicate. The data is posted for transmit in messages, received in messages and completion is expected to be reported in units of messages. TCP in its turn is a byte-stream oriented protocol, which is not aware of possible message boundaries of ULP data. Therefore, the task of translation of message-to-byte stream semantics falls on RDMA, and their offloaded implementation.
RDMA is a message-oriented ULP that uses TCP reliability services. RDMA adds another level of complexity to the mapping of message-oriented ULP to the byte-oriented TCP semantics. RDMA uses the same TCP connection to transmit messages posted by two independent sources:                RequestOut—RDMA requests originated by local consumer; and        ResponseOut—RDMA responses originated by RNIC, as a result of reception and processing of inbound RDMA Read Request sent by remote consumer.        
RNIC interleaves RequestOut and ResponseOut messages when transmitting them through the same TCP connection. In addition to the byte-stream to the message-stream mapping, RNIC needs to preserve “transmit ordering” between messages from RequestOut and ResponseOut, during completion and retransmit processes.
Different RDMA requests (e.g., “Fence”) and completion ordering rules (e.g., RDMA Read Request is completed when RNIC receives an RDMA Read Response) may suspend a transmit or completion process in RequestOut. This suspension however does not prevent ResponseOut from performing transmit and completion of RDMA Read Responses. Accordingly, in order to preserve independence between the RDMA request queue and RDMA response queue, a system is required to efficiently preserve order between the two queues.
One approach to resolve this issue would be to build a single request/response channel implemented, e.g., using a control structure (descriptors) and/or by maintaining a separate copy of the data. Such an approach has several disadvantages, including:                Additional copy operations are required;        More RNIC resources are needed to implement such an approach (the data/control is copied to the adapter memory);        Lack of flexibility (adapter memory is a limited resource, which limits the number of RDMA messages that can be outstanding on the wire); and        Enforcing completion ordering between RDMA Requests and RDMA Responses is more difficult.        
Accordingly, a solution is required to address the above-mentioned problems.