1. The Field of the Invention
The present invention relates generally to computer networking technology. More particularly, the present invention relates generally to mechanisms for optimizing the offload of network computing tasks.
2. Background and Relevant Art
The complexity and sophistication of operating systems, application software, networking technology, and the like continue to increase at dramatic rates, resulting in increased computer functionality. This increased functionality often results in increased Central Processor Unit (CPU) load (hereinafter also referred to as “CPU overhead”) due to the additional duties that must be performed by the CPU to implement the increased functionality.
One area where the increase in CPU overhead is readily apparent is in the area of networked applications where network speeds are increasing due to the growth in high bandwidth media. Network speeds may even rival the CPU processor speed and access speeds for local memory at host computers. These networked applications further burden the host processor due to the layered architecture used by most operating systems, such as the seven-layer Open System Interconnect (OSI) model or the layered model used by the Windows operating system.
As is well known, such a model is used to describe the flow of data between the physical connection to the network and the end-user application. The most basic functions, such as putting data bits onto the network cable, are performed at the bottom layers, while functions attending to the details of applications are at the top layers. Essentially, the purpose of each layer is to provide services to the next higher layer, shielding the higher layer from the details of how services are actually implemented. The layers are abstracted in such a way that each layer believes it is communicating with the same layer on the other computer.
Various functions that are performed on a data packet as it proceeds between layers can be software intensive, and thus often require a substantial amount of CPU processor and memory resources. For instance, certain functions that are performed on the packet at various layers are extremely CPU intensive, such as packet checksum calculation and verification, encryption and decryption of data (e.g., SSL encryption and IP Security encryption), message digest calculation, TCP segmentation, TCP retransmission and acknowledgment (ACK) processing, packet filtering to guard against denial of service attacks, and User Datagram Protocol (UDP) packet fragmentation. As each of these functions is performed, the resulting demands on the CPU can greatly affect the throughput and performance of the overall computer system.
As the demand on CPU resources grows, the capability and throughput of computer hardware peripherals such as network interface cards (NICs) and the like are also increasing. These peripherals are often equipped with a dedicated processor and memory that are capable of performing many of the tasks and functions that are otherwise performed by the CPU.
The computer industry recognized this capability and developed methods to offload CPU intensive tasks and functions that were previously performed by the CPU. For example, commonly assigned U.S. Pat. No. 6,141,705 to Anand et al., U.S. Pat. No. 6,370,599 to Anand et al., and U.S. patent application Ser. No. 09/726,082, “Method and Computer Program Product for Offloading Processing Tasks from Software to Hardware,” filed Nov. 29, 2000 provide solutions to query peripheral devices and offload specific processor tasks to the peripheral devices that are capable of performing the intensive tasks and functions. The specific tasks typically offloaded include tasks such as TCP (Transmission Control Protocol) and or IP (Internet Protocol) checksum computation, TCP segmentation such as Large Send Offload (LSO), and secure Internet protocol (IPSEC) encryption and decryption.
These offload mechanisms are limited in that the mechanisms have a secondary requirement that a minimum number of changes be made to the network stack. As a result of this secondary requirement, another limitation is that the offloads have a long code path because the entire network stack is traversed with the offloaded tasks and functions disabled to reach the peripheral device. A further limitation is the lack of integration with the network stack. There is no well defined interface for the network stack to query or set parameters on the peripheral device or an interface for the peripheral device to inform the network stack of any notifications or changes of capabilities. For example, if the route changes when an LSO request is being processed, the failback mechanism is for the stack to wait for timeouts and retransmit the LSO request.
Another approach that peripheral device manufacturers tried to take was to offload the entire TCP connection from the core stack to a Network Interface Card (NIC). This approach bypasses the entire protocol stack by using a proprietary interface and requires the peripheral device to handle TCP messages, IP (Internet Protocol) messages, ICMP (Internet Control Message Protocol) messages, DNS (Domain Name Server) messages, Dynamic Host Configuration Protocol (DHCP) messages, Routing Information Protocol (RIP) messages, etc. Additionally, this approach does not address multi-homed environments and does not cleanly integrate with the host operating system network management utilities. If a peripheral device state changes, the offloaded connection can easily fail. Such a potential for offload connection failure is only one of other disadvantages in the present art.
Note that a single “connection” typically consists of state for each of the network layers, referred to here as “state objects”. Offloading of network protocol computation, however, is not limited to connection oriented protocols. Other protocols, such as IPSEC, may not be connection oriented but still contain state objects, or groups of state objects, which can be offloaded. Remote DMA (RDMA) may add additional state objects to those required for connection offload. This invention applies to all of the above, thus “state object” refers to one or more network layers which contain state, which may or may not be connection oriented, and may or may not contain more state than just that required for TCP/IP networking.
By way of explanation, and not of limitation, a “network connection” (sometimes herein referred to as a “connection”) will be understood to be a “connection” using a protocol such as a TCP connection, a Stream Control Transmission Protocol (SCTP) connection (or stream), and, when it is used to implement connection-oriented communication, connection oriented User Datagram Protocol (UDP). Further, a “network connection” (or “connection”), while typically implemented at the transport layer (i.e. Layer 4) of the seven layer OSI network protocol stack model, could also occur at other layers, including Application Layer protocols that implement connections, or Data Link Layer protocols that implement connections.
Continuing, those developing offload mechanisms have focused on iteratively transferring only one connection at a time or one logical grouping of state objects at a time, particularly in the case of connections that need a relatively high number of CPU cycles for processing. For example, the total number of CPU cycles needed to process a single offloaded connection or state object can be expressed as:                A+[B−C], where        “A”=the number of CPU cycles needed to offload a connection or link;        “B”=the number of CPU cycles needed to process the connection or link; and        “C”=the number of CPU cycles saved by offloading the connection or link.If the first term (i.e., “A”), is substantially greater than the second terms (i.e., “B−C”), then it is not generally cost effective in terms of CPU overhead to offload a given connection or link. By contrast, if the second term (“B−C”) is substantially greater than the first term (i.e., “A”), then a benefit can be realized by offloading the connection or state object. Accordingly, offload mechanisms have been geared primarily toward offloading a connection or state object where the “B−C” term is relatively high. This is frequently the case for “long-lived” connections or connections used to transfer large files.        
Long-lived connections, and connections or state objects for transmitting large amounts of data, however, are not the only types of connections or state objects that may require expenditure of a host computer's valuable CPU resources. For example, in the case of a host on a Wide Area Network (WAN) such as the Internet, a server hosting web pages may be equally consumed with processing hundreds of thousands of “short-lived” connections, such as thousands of simple web page requests.
Generally speaking, “short-lived” connections are, as implied by name, connections whose lifetime is short, such as HyperText Transfer Protocol (HTTP) connections where there may be one or more short data transfer requests and responses between a remote computer and a server (or host), often over a wide area network. HTTP version 1.0 illustrates this case. To request an ordinary text web page using HTTP 1.0, a client initiates a two-way connection with the server (or host) hosting the web page. The client's request of the server will be typically one message, a short ASCII string sequence such as “get file, file name” (e.g., “GET http://10.1.1.1/file_name.html”), the request comprising sometimes no more than a total data size of 100 bytes. After accepting the client's request, the client will close the first direction (client to server) of the two-way connection, and the server will respond by sending the text web page over the second direction (server to client) of the two-way connection, and then close the second direction of the two-way connection. There are many different protocols with this type of workload, including some that combine a small number of requests before closing the connection, rather than processing a single request. Each such protocol is referred to as a short-lived connection.
The HTTP web page request often comprises anywhere from 100 to 1,000 bytes. The HTTP response may also likewise be quite small if, for example, the requested web page is text-based and contains little, if any, multimedia information. Accordingly, neither the request nor the response carries a significant amount of data, especially when one considers the widespread availability of high-speed networks. Moreover, neither the request nor the response requires a significant amount of CPU time under present standards due to increasingly fast processors. Therefore, the connection time needed to process the HTTP request and response may be quite short-lived. In this case, the number of CPU cycles required to offload a single short-lived connection could easily be greater than the number of CPU cycles preserved by not having the CPU process the short-lived connection. Consequently, it is relatively expensive in terms of CPU cycles to iteratively offload one short-lived connection at a time.
Another constraint related to offloading network computation involves working with specifications that provide for creation of one or more virtual peripheral device(s) on top of one or more physical peripheral device(s). This is useful for several reasons, including aggregation of multiple network links to support load balancing of network traffic and failover of network traffic, as well as subdividing a single link into one or more virtual links, commonly referred to as a Virtual LAN (VLAN), which enables network management to view a physical network as multiple, logically distinct, networks.
Examples of technologies which provide link aggregation include IEEE 802.3ad (also referred to herein as “the 802.3ad standard”), as well as other vendor proprietary standards. In particular, the 802.3ad standard (as well as vendor proprietary standards) defines how two or more Ethernet links can be combined to support load balancing of network traffic or failover of network traffic across multiple Ethernet links. This capability can be used to a) increase the fault tolerance of the host to network link failures and/or b) to increase the networking capacity of the server by enabling networking traffic to be load balanced across multiple Ethernet links. To accomplish this end, the 802.3ad standard provides for a “team” of peripheral devices that can be “tearned” into one virtual peripheral device. The virtual peripheral device then manages the physical peripheral devices in the team by changing which of the peripheral devices in the team are currently enabled (for failover) and/or by directing network traffic across the multiple physical devices (for load balancing). For example, if the “team” is configured for failover and the virtual peripheral device detects that a physical peripheral device has failed, then the virtual peripheral device will no longer use the failed peripheral device and instead use another peripheral device in the team of peripheral devices to send and receive networking traffic.
Examples of Virtual LAN specifications include the IEEE 803.1q specification for VLAN tags. The specification defines a tag that is added to the Media Access Control (MAC) address, enabling the Ethernet network to be logically subdivided into separate networks by changing the tag value. This enables system administrators to logically separate traffic for different administrative domains (e.g. separate a network for engineering development from a network for payroll) while using the same physical network for both. It also allows for isolation for security concerns, such as logically isolating block storage network traffic from traditional peer-to-peer traffic.
There are several limitations when using current state of the art in network offload and attempting to combine them with the ability to create virtual peripheral devices. If the virtual peripheral device is used for load balancing or fault tolerance, and a connection or state object is offloaded through the virtual peripheral device to a physical peripheral device, and the physical peripheral device fails, typically the offloaded connection or state object can not be recovered (let alone failed over to another physical peripheral device). Additionally, current state of the art often requires the virtual peripheral device to be able to team heterogeneous physical devices. If physical peripheral devices support offloading of connections or state objects, then this means that the physical devices in the team have different offload capabilities. If a physical device fails and a new physical device within the team is selected by the virtual device, the new physical device may have a greater or lesser capacity for offload or different capabilities for offload than the prior device. Current state of the art for offloading of network computation does not enable this level of versatility.
Accordingly, there is a need in the art to reduce the overhead associated with offloading of multiple connections from a processor to a peripheral device, and vice versa, as well as a need to extend offload capabilities to cleanly integrate with existing specifications that enable the creation of one or more virtual peripheral device(s) on top of one or more physical peripheral device(s) to increase fault tolerance, network capacity, and network management in an offloaded environment. In particular, there is a need for solutions for offloading multiple short-lived connections while simultaneously maintaining specialized processing requirements for the individual connections. As well, there is a need for solutions for offloading or uploading aggregated connections within the 802.3ad environment (particularly as applied to peripheral device failover events) and a need to enable support for Virtual LANs.