In a network environment, a network adapter or controller on a host computer, such as an Ethernet controller, Fibre Channel controller, etc., will receive Input/Output (I/O) requests or responses to I/O requests initiated from the host computer. Often, the host computer operating system includes a device driver to communicate with the network controller hardware to manage I/O requests to transmit over a network. The host computer may also utilize a protocol which packages data to be transmitted over the network into packets, each of which contains a destination address as well as a portion of the data to be transmitted. A transport protocol layer can also process the packets received by the network controller and access any I/O commands or data embedded in the packet.
By analogy, a packet is much like an envelope dropped in a mailbox. A packet typically includes “payload” and a “header”. The packet's “payload” is analogous to the letter inside the envelope. The packet's “header” is much like the information written on the envelope itself. The header can include information to help network devices handle the packet appropriately.
A computer may employ the TCP/IP (Transmission Control Protocol/Internet Protocol) to encode and address data for transmission, and to decode and access the payload data in the TCP/IP packets received at the network controller. IP specifies the format of packets, also called datagrams, and the addressing scheme. TCP is a higher level protocol which establishes a connection between a destination and a source and provides a byte-stream, reliable, full-duplex transport service. TCP provides applications with simple primitives for establishing a connection (e.g., CONNECT and CLOSE) and transferring data (e.g., SEND and RECEIVE). Behind the scenes, TCP transparently handles a variety of communication issues such as data retransmission, adapting to network traffic congestion, and so forth.
To provide these services, TCP operates on packets known as segments. Generally, a TCP segment travels across a network within (“encapsulated” by) a larger packet such as an Internet Protocol (IP) datagram. The payload of a segment carries a portion of a stream of data sent across a network. A receiver can restore the original stream of data by collecting the received segments. Potentially, segments may not arrive at their destination in their proper order, if at all. For example, different segments may travel very different paths across a network. Thus, TCP assigns a sequence number to each data byte transmitted. This enables a receiver to reassemble the bytes in the correct order. Additionally, since every byte is sequenced, each byte can be acknowledged to confirm successful transmission.
Another protocol, Remote Direct Memory Access (RDMA) on top of TCP provides, among other operations, direct placement of data at a specified memory location at the destination. An RDMA segment may be encapsulated in a TCP segment as the payload of that TCP segment. The RDMA segment has its own RDMA header and payload which includes the RDMA message data. In addition, an RDMA segment may include other information including markers, cyclic redundancy check (CRC) information and pad bytes.
RDMA over TCP/IP protocols include three wire protocols layered above TCP: Marker Protocol Data Unit (PDU) Aligned TCP Framing Protocol (MPA), Direct Data Placement Protocol (DDP); and RDMA Protocol (RDMAP). These protocols support generic memory-to-memory data transfer semantics across a TCP/IP network without intermediate buffering, in some applications. Details on the Marker Protocol Data Unit (PDU) Aligned TCP Framing Protocol (MPA) are described in “Marker PDU Aligned Framing for TCP Specification (Version 1.0),” (October, 2002). Details on the Direct Data Placement Protocol (DDP) are described in the “Direct Data Placement over Reliable Transports (Version 1.0),” (October, 2002). Details on the RDMA Protocol (RDMAP) are described in “An RDMA Protocol Specification (Version 1.0),” (October, 2002).
A device driver, program or operating system can utilize significant host processor resources to handle network transmission requests to the network controller. One technique to reduce the load on the host processor is the use of a TCP/IP Offload Engine (TOE) in which TCP/IP protocol related operations are carried out in the network controller hardware as opposed to the device driver or other host software, thereby saving the host processor from having to perform some or all of the TCP/IP protocol related operations. Similarly, an RDMA-enabled Network Interface Controller (RNIC) offloads RDMA and transport related operations from the host processor(s).
In some known designs, an I/O device such as a network controller or a storage controller may have the capability of directly placing data into an application buffer or other memory area. An RNIC is an example of an I/O device which can perform direct data placement.
An RDMA Verbs Interface (RI) which supports the RNIC Verb Specification (RDMA Protocol Verbs Specification 1.0, April, 2003) can provide a low overhead interface to an RNIC for applications suitable for low latency, high-speed communications. An RDMA Verb is an operation which an RNIC Interface is expected to be able to perform. A Verbs Consumer may use an RNIC Interface to set up communication to other nodes through RDMA Verbs. RDMA Verbs provide RDMA Verb Consumers the capability to control data placement, eliminate data copy operations, and reduce communications overhead and latencies by allowing one Verbs Consumer to directly place information in the memory of another Verbs Consumer, while preserving operating system and memory protection semantics.