InfiniBand™ (IB) is a switched-fabric communications architecture that is widely used in high-performance computing. It has been standardized by the InfiniBand Trade Association. Computing devices (host processors and peripherals) connect to the IB fabric via a network interface controller (NIC), which is referred to in IB parlance as a channel adapter. Host processors (or hosts) use a host channel adapter (HCA), while peripheral devices use a target channel adapter (TCA).
Client processes (referred to hereinafter as clients) running on a host processor, such as software application processes, communicate with the transport layer of the IB fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. To send and receive messages over the network using a HCA, the client initiates work requests (WRs), which cause work items, called work queue elements (WQEs), to be placed onto the appropriate work queues. Normally, each WR has a data buffer associated with it, to be used for holding the data that is to be sent or received in executing the WQE. The HCA executes the WQEs and thus communicates with the corresponding QP of the channel adapter at the other end of the link.
IB channel adapters implement various service types and transport operations, including remote direct memory access (RDMA) read and write and send operations. Both RDMA write and send requests carry data sent by a channel adapter (known as the requester) and cause another channel adapter (the responder) to write the data to a memory address at its own end of the link. Whereas RDMA write requests specify the address in the remote responder's memory to which the data are to be written, send requests rely on the responder to determine the memory location at the request destination. This sort of send operation is sometimes referred to as a “push” operation, since the initiator of the data transfer pushes data to the remote QP.
Upon receiving a send request addressed to a certain QP, the channel adapter at the destination node places the data sent by the requester into the next available receive buffer for that QP. To specify the receive buffers to be used for such incoming send requests, a client on the host computing device generates receive WQEs and places them in the receive queues of the appropriate QPs. Each time a valid send request is received, the destination channel adapter takes the next WQE from the receive queue of the destination QP and places the received data in the memory location specified in that WQE. Thus, every valid incoming send request engenders a receive queue operation by the responder.
The Internet Wide Area RDMA Protocol (iWARP) offers services and semantics for Internet Protocol (IP) networks that are similar to the IB features described above. Features of iWARP are specified by Shah et al., in “Direct Data Placement over Reliable Transports,” published as Request for Comments (RFC) 5041 of the Internet Engineering Task Force (IETF). Implementation of iWARP over the Transmission Control Protocol (TCP) is described by Culley et al., in “Marker PDU Aligned Framing for TCP Specification,” published as IETF RFC 5044.
U.S. Pat. No. 7,263,103, whose disclosure is incorporated herein by reference, describes a method for network communication in which a pool of descriptors (or WQEs) is shared among a plurality of transport service instances used in communicating over a network. Each of the descriptors in the pool includes a scatter list, indicating a buffer that is available in a local memory. When a message containing data to be pushed to the local memory is received over the network on one of the transport service instances, one of the descriptors is read from the pool. The data contained in the message are written to the buffer indicated by the scatter list included in this descriptor.
U.S. Pat. No. 6,789,143 describes a distributed computing system in which queue pairs and completion queues are implemented in hardware. A mechanism is provided for controlling the transfer of work requests from the consumer to the channel adapter hardware and work completions from the channel adapter hardware to the consumer using head and tail pointers that reference circular buffers.