InfiniBand™ (IB) is a scalable, switch-based, point-to-point interconnect architecture which defines both a layered hardware protocol (physical, link, network, transport layers) and a software layer, which manages initialization and communication between devices. The IB protocols and other network features are defined in the InfiniBand Architecture Specification (Release 1.3, March, 2015, referred to hereinafter as the “IB specification”). The IB specification is incorporated herein by reference for purposes of the transport layer protocols and packet formats that it describes, but to the extent that any terms are defined in the IB specification in a manner that conflicts with definitions made explicitly or implicitly in the present patent application, only the definitions in the present patent application should be considered.
The IB transport layer is responsible for in-order packet delivery (including reliable packet delivery, as defined below), partitioning, channel multiplexing and transport services. Client processes, running on a host computer, interact with the transport layer on the host channel adapter (HCA, also referred to generically as a network interface controller, or NIC) of the computer by submitting work requests (WRs). Host driver software translates the WRs into work items, referred to as work queue elements (WQEs), and queues them in assigned queue pairs (QPs) for execution by the HCA. Each QP conventionally includes a send queue (SQ) and a receive queue (RQ).
The transport layer also handles transaction data segmentation when sending and reassembly when receiving. Based on the Maximum Transfer Unit (MTU) of the path, the transport layer divides the data into packets of the proper size. A receiver reassembles the packets based on their Base Transport Header (BTH), which normally contains the destination QP number and packet sequence number (PSN). The receiver acknowledges the packets, and the sender receives these acknowledgements and updates a completion queue with the status of the operation.
IB specifies the following transport services:                Reliable Connection (RC). RC provides a reliable transfer of data between two entities. RC transport provides remote direct memory access (RDMA) operations, atomic operations, and reliable channel semantics. As a connection-oriented transport, RC requires a dedicated queue pair (QP) for each pair of requester and responder processes.        Unreliable Connection (UC). UC facilitates an unreliable transfer of data between two entities. Unlike RC, UC messages may be lost. UC provides RDMA capability, but does not guarantee ordering or reliability. Each pair of connected processes requires a dedicated QP.        Reliable Datagram (RD). Using RD enables one or more QPs to send and receive messages using a reliable datagram channel (RDC) between each pair of reliable datagram domains (RDDs). RD provides most of the features of RC, but does not require a dedicated connection for each pair of processes.        Unreliable Datagram (UD). With UD, a QP can send and receive messages to one or more QPs, but the messages may get lost. UD is connectionless, allowing a single QP to communicate with any other peer QP. UD has a limited message size, and does not guarantee ordering or reliability.        Raw Datagram. A raw datagram is a data link layer service which provides a QP with the ability to send and receive raw datagram messages that are not interpreted.        
Annex A14 of the IB specification defines an additional transport service: Extended Reliable Connected (XRC). XRC enables a single receive QP to be shared by multiple shared receive queues (SRQs) across one or more processes running on a given host. As a result, each process can maintain a single send QP to each host rather than to each remote process. A receive QP is established per remote send QP and can be shared among all the processes on the host.
U.S. Pat. No. 8,213,315, whose disclosure is incorporated herein by reference, describes a dynamically-connected (DC) transport service, which is intended to reduce the number of required QPs per end-node while preserving RC semantics. The DC transport service provides a datagram-like model that allows a DC QP to reach multiple remote processes in multiple remote nodes. Each WR submitted by a client process to a DC send queue includes information identifying the targeted remote destination process. DC contexts are then dynamically tied to each other across the network to create a dynamic (i.e., temporary) RC-equivalent connection that is used to reliably deliver one or more messages. When the initiator (i.e., the HCA of the sending end-node) reaches a point in its send queue at which either there are no further WQEs to execute, or the next WQE is destined to another process (possibly in a different node), the dynamic connection is torn down. The same DC context may then be used to establish a new dynamic connection to another destination process.