The computer industry is moving toward fast, packetized, serial input/output (I/O) bus architectures, in which computing hosts and peripherals are linked by a switching network, commonly referred to as a switching fabric. A number of architectures of this type have been proposed, culminating in the “InfiniBand™” (IB) architecture, which has been advanced by a consortium led by a group of industry leaders (including Intel, Sun Microsystems, Hewlett Packard, IBM, Compaq, Dell and Microsoft). The IB architecture is described in detail in the Infiniband Architecture Specification, Release 1.0 (October, 2000), which is incorporated herein by reference. This document is available from the InfiniBand Trade Association at www.infinibandta.org.
A host processor (or host) connects to the IB network via a network interface adapter, which is referred to in IB parlance as a host channel adapter (HCA). Typically, the HCA is implemented as a single chip, with connections to the host bus and to the network. Client processes running on the host communicate with the transport layer of the IB fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. The IB specification permits the HCA to allocate as many as 16 million (224) QPs, each with a distinct queue pair number (QPN). A given client may open and use multiple QPs simultaneously.
To send and receive communications over the network, the client initiates work requests (WRs), which causes work items, called work queue elements (WQEs), to be placed onto the appropriate queues. The channel adapter then executes the work items, so as to communicate with the corresponding QP of the channel adapter at the other end of the link. After it has finished servicing a WQE, the HCA writes a completion queue element (CQE) to a completion queue, to be read by the client.
The QP that initiates a particular operation, i.e. injects a message into the fabric, is referred to as the requester, while the QP that receives the message is referred to as the responder. An IB operation is defined to include a request message generated by the requester and, as appropriate, its corresponding response generated by the responder. (Not all request messages have responses.) Each message consists of one or more IB packets. Typically, a given HCA will serve simultaneously both as a requester, transmitting requests and receiving responses on behalf of local clients, and as a responder, receiving requests from other channel adapters and returning responses accordingly. Request messages include, inter alia, remote direct memory access (RDMA) write and send requests, all of which cause the responder to write data to a memory address at its own end of the link, and RDMA read requests, which cause the responder to read data from a memory address and return it to the requester. Atomic read-modify-write requests can cause the responder both to write data to its own memory and to return data to the requester. Most response messages consist of a single acknowledgment packet, except for RDMA read responses, which may contain up to 231 bytes of data, depending on the data range specified in the request.
To generate an outgoing message or service an incoming message on a given QP, the HCA uses context information pertaining to the QP. The QP context is created in a memory accessible to the HCA by the client process that sets up the QP. The client configures the QP context with fixed information such as the destination address (referred to as the LID—local identifier), negotiated operating limits, service level and keys for access control. Typically, a variable part of the context, such as the current packet sequence number (PSN) and information regarding the WQE being serviced by the QP, is subsequently updated by the HCA as it sends and receives messages. For example, to service an incoming packet on a reliable (acknowledged) connection, the HCA reads the packet transport header, which identifies the target QP, and uses the context of that QP to verify that the packet came from the correct source and that the PSN is valid (no missed packets). For a RDMA write operation on a reliable connection, the HCA reads the WQE and retrieves necessary data from the QP context, such as the destination address, target QP and next PSN. It then accesses the host memory to retrieve the packet data, and sends the packet to the destination.
The IB specification also provides for unconnected and unreliable services, which use the QP context in different ways. (For example, unreliable services typically ignore the PSN, while unconnected services do not maintain a specific target QP as part of the QP context.) In all cases, however, the QP context plays a key role in processing every packet that the HCA must send or receive.
It will be appreciated that the QP context for each pending QP contains a substantial amount of data. Because of the high cost of on-chip memory, it is not possible to provide sufficient memory on the HCA to store context information for all of the 16 million QPs allowed by the IB specification. Therefore, in practical HCA implementations it is necessary to limit the number of QPs supported according to the size of the available on-chip memory or to store QP context information off-chip. For example, the IBM PCI-X to InfiniBand Host Channel Adapter, produced by IBM Microelectronics Division (Hopewell Junction, N.Y.) implements a layered memory structure, in which connection-related information is stored in on-device memory and also, optionally, in off-device memory attached to the HCA. This optional configuration allows the IBM HCA to support up to 16K QPs. There is inevitably a price to be paid for storing context information off-chip, in terms of the time needed by the HCA to access the information when processing incoming and outgoing messages.