In conventional computer systems, a host computer is linked to a network using a network interface card (NIC), which is connected to the internal bus of the host. The most common example of this type of connection is the use of Ethernet network adapter cards, which plug into the Peripheral Component Interface (PCI) bus of a personal computer and link the computer to a 10BASE-T or 100BASE-T local-area network (LAN). Ethernet cards of this sort are widely available and inexpensive. They provide the necessary physical layer connection between the host and the serial LAN or WAN medium, as well as performing some media access control (MAC) layer functions. Network- and transport-layer protocol functions, such as Internet Protocol (IP) and Transmission Control Protocol (TCP) processing, are typically performed in software by the host.
As network speeds increase, moving up to Gigabit Ethernet (GbE) and Ten Gigabit Ethernet, for example, this sort of simple NIC is no longer adequate. Working in a GbE environment at wire speed typically requires that the NIC have a much faster and more costly physical interface and MAC handling functions. It is also desirable that the NIC take on a larger share of the higher-level protocol processing functions. NICs have recently been introduced with “protocol offloading” capability, in the form of dedicated hardware processing resources to relieve the host of network layer (IP) processing and even transport and higher-layer functions. Such hardware resources reduce the processing burden on the host and therefore eliminate a major bottleneck in exploiting the full bandwidth available on the network, but they also add substantially to the cost of the NIC. Since a typical host communicates with a LAN or WAN only intermittently, in short bursts, the high-speed processing capabilities of the NIC are unused most of the time.
The computer industry is moving toward fast, packetized, serial input/output (I/O) bus architectures, in which computing hosts and peripherals, such as NICs, are linked by a system area network (SAN), commonly referred to as a switching fabric. A number of architectures of this type have been proposed, culminating in the “InfiniBand™” (IB) architecture, which has been advanced by a consortium led by a group of industry leaders (including Intel, Sun Microsystems, Hewlett Packard, IBM, Compaq, Dell and Microsoft). The IB architecture is described in detail in the InfiniBand Architecture Specification, Release 1.0 (October, 2000), which is incorporated herein by reference. This document is available from the InfiniBand Trade Association at www.infinibandta.org.
A host processor (or host) connects to the IB fabric via a fabric interface adapter, which is referred to in IB parlance as a host channel adapter (HCA). Peripherals are connected to the fabric by a target channel adapter (TCA). Client processes running on the host communicate with the transport layer of the IB fabric by manipulating a transport service instance, known as a “queue pair” (QP), made up of a send work queue and a receive work queue. The IB specification permits the HCA to allocate as many as 16 million (224) QPs, each with a distinct queue pair number (QPN). A given client may open and use multiple QPs simultaneously. To send and receive communications over the fabric, the client initiates work requests (WRs), which causes work items, called work queue elements (WQEs), to be placed in the appropriate queues. The channel adapter then executes the work items, so as to communicate with the corresponding QP of the channel adapter at the other end of the link.
For any given operation, the QP that initiates the operation, i.e. injects a message into the fabric, is referred to as the requester, while the QP that receives the message is referred to as the responder. (A given QP can be both a requester and a responder in different operations.) An IB operation is defined to include a request message generated by the requester and, as appropriate, its corresponding response generated by the responder. (Not all request messages have responses.) Each QP is configured for a certain transport service type, based on how the requesting and responding QPs interact. Both the source and destination QPs must be configured for the same service type. The IB specification defines four service types: reliable connection, unreliable connection, reliable datagram and unreliable datagram. The reliable services require that the responder acknowledge all messages that it receives from the requester, in order to guarantee reliability of message delivery.
Each message consists of one or more IB packets, depending on the size of the message payload compared to the maximum transfer unit (MTU) of the message path. Typically, a given channel adapter will serve simultaneously both as a requester, transmitting requests and receiving responses on behalf of local clients, and as a responder, receiving requests from other channel adapters and returning responses accordingly. Request messages include, inter alia, remote direct memory access (RDMA) write and send requests, all of which cause the responder to write data to a memory address at its own end of the link, and RDMA read requests, which cause the responder to read data from a memory address and return it to the requester. Atomic read-modify-write requests can cause the responder both to write data to its own memory and to return data to the requester. Most response messages consist of a single acknowledgment packet, except for RDMA read responses, which may contain up to 231 bytes of data, depending on the data range specified in the request. RDMA write and send requests may likewise contain up to 231 bytes of data. RDMA read and write requests specify the memory range to be accessed by DMA in the local memory of the responder. Send requests rely on the responder to determine the memory range to which the message payload will be written.
Although IB does not explicitly define quality of service (QoS) levels, it provides mechanisms that can be used to support a range of different classes of service on the network. Each IB packet carries a Service Level (SL) attribute, indicated by a corresponding SL field in the packet header, which permits the packet to be transported at one of 16 service levels. The definition and purpose of each service level is not specified by the IB standard. Rather, it is left as a fabric administration policy, to be determined between each node and the subnet to which it belongs. Thus, the assignment of service levels is a function of each node's communication manager and its negotiation with a subnet manager. As a packet traverses the fabric, its SL attribute determines which virtual lane (VL) is used to carry the packet over the next link. For this purpose, each port in the fabric has a SL to VL mapping table that is configured by subnet management.
IB fabrics are well suited for multi-processor systems and allow input/output (I/O) units, such as a network interface device with a suitable TCA, to communicate with any or all of the processor nodes in a system. In this manner, a NIC can be used by multiple hosts over an IB fabric to access an external network, such as an Ethernet LAN or WAN. NICs known in the art, however, have only a single network port and are designed to serve a single host. Although the IB fabric and protocols provide the means for multiple hosts to communicate with a given NIC, the IB specification is not concerned with the operation of the NIC itself and does not suggest any way that the NIC could serve more than one host at a time.
The IB specification also does not say how Ethernet frames (or other types of network packets or datagrams) should be encapsulated in IB communication streams carried over the fabric between the host and the NIC. In the local Windows™ environment, for example, the Network Driver Interface Specification (NDIS) specifies how host communication protocol programs and NIC device drivers should communicate with one another for this purpose. NDIS defines primitives for sending and receiving network data, and for querying and setting configuration parameters and statistics. For connecting hosts to Ethernet NICs over dynamic buses such as the Universal Serial Bus (USB), Bluetooth™ and InfiniBand, Microsoft (Redmond, Wash.) has developed Remote NDIS (RNDIS) as an extension of NDIS. RNDIS is described in detail in the Remote NDIS Specification (Rev. 1.00, January, 2001), which is incorporated herein by reference. It defines a bus-independent message protocol between a host and a remote network interface device over abstract control and data channels.
In the IB environment, the control and data channels are provided by QPs established for this purpose between the HCA of the host and the TCA of the NIC. RNDIS messages are transported over the channels by encapsulating them in IB “send” messages on the assigned QPs. Either reliable or unreliable IB connections between the QPs may be used for this purpose. The RNDIS control and data connections enable the host to send and receive Ethernet frames as though the NIC was connected to the host by its local PCI bus.