The Open Systems Interconnection (OSI) Reference Model defines seven network protocol layers (L1-L7) used to communicate over a transmission medium. The upper layers (L4-L7) represent end-to-end communications and the lower layers (L1-L3) represent local communications.
Networking application aware systems need to process, filter and switch a range of L3 to L7 network protocol layers, for example, L7 network protocol layers such as, HyperText Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), and L4 network protocol layers such as Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). In addition to processing the network protocol layers, the networking application aware systems need to simultaneously secure these protocols with access and content based security through L4-L7 network protocol layers including Firewall, Virtual Private Network (VPN), Secure Sockets Layer (SSL), Intrusion Detection System (IDS), Internet Protocol Security (IPSec), Anti-Virus (AV) and Anti-Spam functionality at wire-speed.
A socket interface such as the Berkley Socket (BSD) socket interface is a standard Application Programming Interface (API) that includes a set of functions that can be called by networking applications such as web browsers to provide networking services. The application calls an API function which makes a system call to the operating system. The socket interface provides the service requested by the system call by calling low-level functions in the operating system that handle the networking protocol. The underlying protocols and network mechanisms are hidden from the application.
The BSD socket interface is typically used to interface to the TCP/IP protocol stack. TCP/IP is based on a client/server model in which client and server processes identified by network endpoints (IP address and port number) are represented as sockets. A socket is set up between a client process and a server process for a particular domain and network protocol for example, UDP or TCP. The server process is identified by binding an IP address and port number to the socket. After the server process is bound to the socket, the socket can be used to listen for new connection requests and to service them. All requests through the socket interface are directed through an TCP/IP network stack in the operating system (kernel) and involve multiple copies of packet data. Packet data for each connection is copied between a device driver buffer in the kernel and application (user) space, that is, data is first copied to the network stack in kernel memory and must then be copied to buffers in application (user) space before it can be used by the application.
The transfer of packet data for the connection can involve transferring the data over a bus such as the PCI bus. Typically, the transfer is performed by a Direct Memory Access (DMA) engine in a PCI interface. In a host-based descriptor ring DMA engine, descriptors are fetched from host PCI shared memory and data to be transferred over the PCI bus is transferred to buffers identified by pointers stored in the descriptors.
There are different mechanisms to communicate from the host to the DMA engine about the availability of buffers and size of buffers. Also, DMA engines have different ways to communicate to the host about data transfer completion (one such mechanism can be coalesced interrupt). In such a mechanism, there is only one program on the host side that can manage buffers in the descriptor ring and all buffers are treated the same, that is, data is stored in the next available buffer in the ring irrespective of the connection they belong to. Generally, a driver in the kernel manages such descriptor rings. Most generic network interface cards (NICs) support this type of a DMA engine.
In a memory to memory data mover DMA engine, the DMA engine can transfer data across the PCI bus within shared PCI memory space. The transmit side of the PCI bus provides data buffer pointers from where data should be gathered and a list of pointers (typically provided by the receive side) where it must be written on the other side of the PCI bus.
Typically, there are thousands to a million TCP connections established concurrently in a high end server. If a standard DMA engine is used to move the data, applications running on the host that are using the TCP connection sleep waiting for data on their corresponding TCP connection. There can be thousands of such application program threads. When data arrives, the kernel running on the host is interrupted, the kernel finds which sleeping process the received data belongs to by walking a long list of sleeping processes and then the kernel wakes up the process after copying the data to application space in host memory. Finding the appropriate process and copying data is very expensive.
As network transfer speeds increase above 1 Giga bits per second (1 Gb/s) and more data is transferred over the network, the CPU bandwidth required to process the multiple copies of the data increases. The overhead for copying reduces the available memory bus bandwidth and adds latency to when the data is available for use by the application.