Switched-fabric communications architectures are widely used in high-performance computing. Examples of such architectures include InfiniBand and high-speed Ethernet. The InfiniBand (IB) architecture will be described here by way of illustration (and aspects of the implementation of the present invention in the IB environment will be described below in the Detailed Description), but it should in no way be understood that the present invention is limited to one particular type of switched fabric or another.
The IB architecture has been standardized by the InfiniBand Trade Association. Computing devices (host processors and peripherals) connect to the IB fabric via a network interface controller (NIC), which is referred to in IB parlance as a channel adapter. Host processors (or hosts) use a host channel adapter (HCA), while peripheral devices use a target channel adapter (TCA).
Client processes (referred to hereinafter as clients), such as software application processes, running on a host processor communicate with the transport layer of the fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. To send and receive messages over the network using a HCA, the client initiates work requests (WRs), which cause work items, called work queue elements (WQEs), to be placed in the appropriate work queues. Normally, each WR has a data buffer associated with it, to be used for holding the data that is to be sent or received in executing the WQE. The HCA executes the WQEs and thus communicates with a corresponding QP of the channel adapter of another host across the network. After it has finished servicing a WQE, the HCA typically writes a completion queue element (CQE) to a completion queue, to be read by the client as an indication that the work request has been executed.
IB channel adapters implement various service types and transport operations, including remote direct memory access (RDMA) read and write operations and SEND operations. Both RDMA write and SEND requests carry data sent by a channel adapter (known as the requester) and cause another channel adapter (the responder) to write the data to a memory address on its own network node. Whereas RDMA write requests specify the address in the remote responder's memory to which the data are to be written, SEND requests rely on the responder to determine the memory location at the request destination.
Upon receiving a SEND request addressed to a certain QP, the channel adapter at the destination node places the data sent by the requester into the next available receive buffer for that QP. To specify the receive buffers to be used for such incoming SEND requests, a client on the host computing device generates receive WQEs and places them in the receive queues of the appropriate QPs. Each time a valid SEND request is received, the destination channel adapter takes the next WQE from the receive queue of the destination QP and places the received data in the memory location specified in that WQE. Thus, every valid incoming SEND request engenders a receive queue operation by the responder.
The Internet Wide Area RDMA Protocol (iWARP) offers services and semantics for Internet Protocol (IP) networks that are similar to the IB features described above. Features of iWARP are specified by Shah et al., in “Direct Data Placement over Reliable Transports,” published as Request for Comments (RFC) 5041 of the Internet Engineering Task Force (IETF). Implementation of iWARP over the Transmission Control Protocol (TCP) is described by Culley et al., in “Marker PDU Aligned Framing for TCP Specification,” published as IETF RFC 5044. In the IP context, a TCP socket may be considered a transport service instance, roughly comparable to an IB QP.
U.S. Patent Application Publication 2003/0065856, whose disclosure is incorporated herein by reference, describes a method for communication between a network interface adapter and a host processor coupled thereto. The method includes writing information using the network interface adapter to a location in a memory accessible to the host processor. Responsively to having written the information, the network interface adapter places an event indication in an event queue accessible to the host processor. It then asserts an interrupt of the host processor that is associated with the event queue, so as to cause the host processor to read the event indication and, responsively thereto, to process the information written to the location.
In some embodiments disclosed in this publication, the network interface adapter asserts the interrupts to notify the host processor that it has written information to the host system memory, to be read and processed by the host. The information may comprise completion information, which the network interface adapter has written to one of a plurality of completion queues. The completion queues are mapped to different host event queues, wherein typically a number of completion queues may share the same event queue. In response to assertion of the interrupt by the network interface adapter, the host event handler reads the event and informs the appropriate application process that there is new information in its completion queue waiting to be read.
U.S. Pat. No. 7,746,854, whose disclosure is incorporated herein by reference, describes a fast flexible filter processor architecture for a network device. An incoming packet is received from a port and the incoming packet is inspected and packet fields are extracted. The incoming packet is classified based on the extracted packet fields and action instructions are generated. Further, the inspection and extraction include applying inspection mask windows to any portion of the incoming packet to extract programmable packet fields.