The computer industry is moving toward fast, packetized, serial input/output (I/O) bus architectures, in which computing hosts and peripherals are linked by a switching network, commonly referred to as a switching fabric. A number of architectures of this type have been proposed, culminating in the “InfiniBand™” (IB) architecture, which has been advanced by a consortium led by a group of industry leaders (including Intel, Sun Microsystems, Hewlett Packard, IBM, Compaq, Dell and Microsoft). The IB architecture is described in detail in the InfiniBand Architecture Specification, Release 1.0 (October, 2000), which is incorporated herein by reference. This document is available from the InfiniBand Trade Association at www.infinibandta.org.
A host processor (or host) connects to the IB network via a network interface adapter, which is referred to in IB parlance as a host channel adapter (HCA). Typically, the HCA is implemented as a single chip, with connections to the host bus and to the network. Client processes running on the host communicate with the transport layer of the IB fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. The IB specification permits the HCA to allocate as many as 16 million (224) QPs, each with a distinct queue pair number (QPN). A given client may open and use multiple QPs simultaneously. To send and receive communications over the network, the client initiates work requests (WRs), which causes work items, called work queue elements (WQEs), to be placed onto the appropriate queues. The channel adapter then executes the work items, so as to communicate with the corresponding QP of the channel adapter at the other end of the link.
The QP that initiates a particular operation, i.e. injects a message into the fabric, is referred to as the requester, while the QP that receives the message is referred to as the responder. An IB operation is defined to include a request message generated by the requester and, as appropriate, its corresponding response generated by the responder. (Not all request messages have responses.) Each message consists of one or more IB packets. Typically, a given HCA will serve simultaneously both as a requester, transmitting requests and receiving responses on behalf of local clients, and as a responder, receiving requests from other channel adapters and returning responses accordingly. Request messages include, inter alia, remote direct memory access (RDMA) write and send requests and atomic read-modify-write operations, all of which cause the responder to write data to a memory address at its own end of the link, and RDMA read requests, which cause the responder to read data from a memory address and return it to the requester. Most response messages consist of a single acknowledgment packet, except for RDMA read responses, which may contain up to 231 bytes of data, depending on the data range specified in the request.
The maximum number of RDMA read requests for a particular QP that can be outstanding at any one time is negotiated between the HCAs involved when the connection between them is established. (The maximum also covers atomic operations supported by some HCAs.) The responder may restrict the number of outstanding RDMA read requests per QP, and may even allow no RDMA read requests at all for some QPs. The need for this restriction stems from the fact that each outstanding RDMA read request consumes a certain amount of memory on the HCA chip. Because of the high cost of this HCA memory, IB devices known in the art typically allow no more than one or a few outstanding read requests per QP. Therefore, the requester must wait until its outstanding RDMA read operations have been completed before sending further RDMA read requests.
To handle the dual role of requester and responder, IB HCAs known in the art typically have separate, independent transmit and receive hardware structures. An example of such a HCA is the IBM PCI-X to InfiniBand Host Channel Adapter, produced by IBM Microelectronics Division (Hopewell Junction, N.Y.). This device features a dual pipeline architecture, with independent microprocessors and DMA engines for concurrent receive and transmit data path processing. It implements a layered memory structure, in which connection-related information is stored in on-device memory and also, optionally, in off-device memory attached to the HCA (not in system memory associated with the host). This optional configuration allows support of up to 16K QPs, with up to four outstanding RDMA read requests per QP.