This present invention relates to computer networking, and more particularly to a method and apparatus for sharing transport protocol tasks between a host and an attached network adapter.
The rapid growth in computer networking has spurred the development of ever-faster network media rates. For instance, over the last ten years, Ethernet-format maximum media rates have gone from 10 megabits-per-second (Mbps), to 100 Mbps (fast Ethernet), and now to 1000 Mbps (gigabit Ethernet). Future increases are planned to allow even faster network communications.
Traditionally, networked host computers have handled communication tasks at the network and transport layers (and some tasks at the link layer) using host software, while leaving the remaining link and physical layer communication tasks to an attached network adapter (which also may be partially implemented in host-resident driver software). Thus for virtually every packet transmitted or received by the network adapter, the host processor must expend resources in handling packetization, header manipulation, data acknowledgment, and error control. At gigabit Ethernet speeds, even sophisticated server systems will often have a maximum network transmission rate limited by the ability of the host processor to handle its network and transport layer tasks, rather than by the speed of the physical connection. Consequently, host-implemented networking tasks can reduce bandwidth utilization and occupy processor throughput that could otherwise be devoted to running applications.
Some network adapter vendors have attempted to increase network performance by offloading the entire transport and lower-layer protocol stack to the network adapter. This approach greatly eases the burden on the host processor, but increases the complexity and expense of the adapter. It also limits flexibility, limits upgradability, and makes platform-specific tailoring difficult. Such an adapter may also require that the entire network stack be rewritten to allow the hardware solution to integrate with the operating system.
Several less-severe modifications to the traditional division of labor between a host processor and a network adapter have also been proposed. One of the more appealing of these proposals is a feature known as xe2x80x9cTCP segmentation offloadxe2x80x9d (See the Microsoft Windows 2000 Device Driver Development Kit for detailed information. Transmission Control Protocol/Internet Protocol (TCP/IP) is perhaps the most popular transport/network layer protocol suite in use today. See Network Working Group, RFC 791, Internet Protocol (1981); Network Working Group, RFC 793, Transmission Control Protocol (1981)). With TCP segmentation offload, the host processor can indicate to the network adapter that a large block of data is ready for TCP transmission, rather than passing numerous smaller TCP packets (each containing part of the large block of data) to the network adapter. With offloading, the network adapter segments the block of data into the smaller packets, builds the TCP, IP, and link-layer headers for each packet, and transmits the packets.
TCP segmentation offload benefits overall system performance due to several factors. First, sending a large block of data requires fewer calls down through the software protocol stack than does sending multiple small blocks, thus reducing CPU utilization for a given workload. Second, when the headers are built in the network adapter hardware, header-building host overhead is avoided, and header information must only be transferred across the host bus once per block rather than once per packet, reducing latency and lowering bus utilization. And third, the network adapter hardware can reduce the number of host interrupts that it generates in order to indicate data transmission, in some instances down to one per block.
I have now recognized that, despite its benefits, TCP segmentation offload has several rather large limitations. First, the size of the block offloaded cannot be larger than the receiving endpoint""s TCP window size (typically equal to somewhere between two and ten maximum-sized Ethernet packets). And second, the host processor must still process roughly the same number of acknowledgment packets (ACKs) from the receiving endpointxe2x80x94roughly one-half to one ACK per data packet sentxe2x80x94despite the segmentation offloading.
In accordance with one aspect of the present invention, a method for operating a network adapter is disclosed. This method comprises the steps of accepting a request from a host-based transmission protocol layer to transmit a block of data to a remote endpoint, segmenting the block of data into multiple data packets, and transmitting the packets to the remote endpoint. During the execution of these steps, the network adapter (either in hardware or in its software driver) interprets acknowledgment data sent by the remote endpoint to the host-based transmission protocol layer, as it passes through the adapter. Preferably, the network adapter also controls transmission of the multiple data packets based on the remote endpoint""s receive window size and other interpreted acknowledgment data. In a particularly preferred embodiment, the adapter traps acknowledgment data bound for the host-based transmission protocol layer, when the acknowledgment data pertains only to the data packets created by the adapter""s segmentation.
In another aspect of the invention, a network adapter is disclosed. The adapter comprises a network interface and a packet buffer memory that buffers packets for transmission over this interface. The adapter also has a context engine that establishes and services connection contexts corresponding to requests for transmission of large data blocks that must be segmented. A packet engine segments such large data blocks into multiple data packets and places these packets in the packet buffer memory. As acknowledgment packets corresponding to the data packets are received via the network interface, a receive filter associates these with the context. The context engine uses flow control (e.g., window size) information taken from the acknowledgment packets to control when the packet engine places data packets in the packet buffer. Preferably, the receive filter selectively intercepts acknowledgment packets associated with the connection context. Also preferably, the adapter includes a context memory that allows it to simultaneously serve multiple connection contexts submitted by the host.
In a further aspect of the invention, a computer system is disclosed. The system has a host processor and a network adapter, both in communication with a system bus. The host processor is software-configured to run a network transport protocol. But the host processor configuration allows the host processor to temporarily relinquish outgoing flow control for a given transport connection to the network adapter, in conjunction with a request to the network adapter to transmit a block of data. The network adapter has an operational mode that allows it to accept a block of data, segment it into smaller blocks for transmission, and provide flow control for those blocks.
An article of manufacture comprising a computer-readable medium containing a program for operating a network transport protocol is also disclosed. When executed, the program configures a processor to run a packet flow controller. It also runs a packet segmentation offloader that can offload packetization of data blocks to a network interface card. The offloader has the capability to instruct the network interface card to temporarily handle flow control for a data block that it is tasked with segmenting. The program also runs a flow control selector that selects, for a given data block, whether to handle packet flow control using the packet flow controller, or to instruct the network interface card to handle packet flow control for the block.
Finally, an article of manufacture comprising a computer-readable medium containing a driver program for a network adapter is disclosed. When executed, the driver program configures a processor to run a packet segmentation offload scheduler that accepts requests from a higher-level protocol (e.g., TCP) to segment a data block and temporarily handle flow control for that block. The scheduler accepts these requests and schedules them onto a network adapter controlled by the driver program. Preferably, the scheduler can track the number of contexts being handled by the hardware, and either queue requested contexts or reject requested contexts when the context hardware is already saturated. The driver also runs a packet segmentation offload status reporter for communicating the status of accepted requests to the higher-level protocol.