Switched-fabric communication architectures are widely used in high-performance computing. Examples of such architectures include InfiniBand™ and high-speed Ethernet™. Computing devices (host processors and peripherals) connect to the switched fabric via a network interface controller (NIC), which is referred to in InfiniBand (IB) parlance as a channel adapter. Host processors (or hosts) use a host channel adapter (HCA), while peripheral devices use a target channel adapter (TCA). Some of the embodiments in the description that follows are based on features of the IB architecture and use vocabulary taken from IB specifications. Similar mechanisms exist in networks and input/output (I/O) devices that operate in accordance with other protocols, such as Ethernet and Fibre Channel, however, and IB terminology and features are used herein by way of example, for the sake of convenience and clarity, and not by way of limitation.
Client processes (referred to hereinafter as clients), such as software application processes, running on a host processor communicate with the transport layer of an IB fabric by manipulating a transport service instance, known as a “queue pair” (QP). Each QP is made up of a send work queue and a receive work queue. To send and receive messages over the network using a HCA, the client submits work items, called work queue elements (WQEs), for execution by the HCA. (More precisely, the client initiates work requests (WRs), which cause WQEs to be placed in the appropriate work queues.) After it has finished servicing a WQE, the HCA typically writes a completion report, in the form of a completion queue element (CQE), to a completion queue, to be read by the client as an indication that the work request has been executed.
Remote direct memory access (RDMA) protocols enable a NIC to carry out direct memory access operations over a network from the memory of one computer to another without directly involving the computer operating systems. For example, an RDMA write command specifies a source buffer in the local host memory and instructs the NIC to transfer the data in the buffer, via one or more packets sent over the network, to a target address in the host memory of a (remote) target node. The NIC at the target node receives the packets and writes the data to the target address. In similar fashion, an RDMA read command specifies a source buffer in a remote node and causes the NIC to request the data in the source buffer and then, upon receiving the data from the remote node, to write the data to a target address in the local host memory.
In IB networks, RDMA read and write operations are an integral part of the transport-layer protocol. These operations provide high-throughput, low-latency data transfers, which are carried out by the HCA under application-level control. RDMA over Converged Ethernet (RoCE) and the Internet Wide Area RDMA Protocol (iWARP) offer similar capabilities over an Ethernet network.