InfiniBand™ (IB) is a switched-fabric communications architecture that is widely used in high-performance computing. It has been standardized by the InfiniBand Trade Association. Computing devices (host processors and peripherals) connect to the IB fabric via a network interface controller (NIC), which is referred to in IB parlance as a channel adapter. Host processors (or hosts) use a host channel adapter (HCA), while peripheral devices use a target channel adapter (TCA).
Client processes (referred to hereinafter as clients) running on a host processor, such as software application processes, communicate with the transport layer of the IB fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. To send and receive messages over the network using a HCA, the client initiates work requests (WRs), which cause work items, called work queue elements (WQEs), to be placed onto the appropriate work queues. Normally, each WR has a data buffer associated with it, to be used for holding the data that is to be sent or received in executing the WQE. The HCA executes the WQEs and thus communicates with the corresponding QP of the channel adapter at the other end of the link.
IB channel adapters implement various service types and transport operations, including remote direct memory access (RDMA) read and write operations, as well as send operations. Both RDMA write and send requests carry data sent by a channel adapter (known as the requester) and cause another channel adapter (the responder) to write the data to a memory address at its own end of the link. Whereas RDMA write requests specify the address in the remote responder's memory to which the data are to be written, send requests rely on the responder to determine the memory location at the request destination. This sort of send operation is sometimes referred to as a “push” operation, since the initiator of the data transfer pushes data to the remote QP.
Upon receiving a send request addressed to a certain QP, the channel adapter at the destination node places the data sent by the requester into the next available receive buffer for that QP. To specify the receive buffers to be used for such incoming send requests, a client on the host computing device generates receive WQEs and places them in the receive queues of the appropriate QPs. Each time a valid send request is received, the destination channel adapter takes the next WQE from the receive queue of the destination QP and places the received data in the memory location specified in that WQE. Thus, every valid incoming send request engenders a receive queue operation by the responder.
U.S. Pat. No. 7,263,103, whose disclosure is incorporated herein by reference, describes a method for network communication in which a pool of descriptors (or WQEs) is shared among a plurality of transport service instances used in communicating over a network. Each of the descriptors in the pool includes a scatter list, indicating a buffer that is available in a local memory. When a message containing data to be pushed to the local memory is received over the network on one of the transport service instances, one of the descriptors is read from the pool. The data contained in the message are written to the buffer indicated by the scatter list included in this descriptor.
U.S. Pat. No. 9,143,467, whose disclosure is incorporated herein by reference, describes a NIC with a circular receive buffer. First and second indices are provided to point respectively to a first buffer in a set to which the NIC is to write and a second buffer in the set from which a client process running on the host device is to read. Responsively to receiving a message, the data are written to the first buffer that is pointed to by the first index, and the first index is advanced cyclically through the set. The second index is advanced cyclically through the set when the data in the second buffer have been read by the client process. In some embodiments, the buffers are all of a uniform size, for example one byte.