In recent years, network bandwidth has been increasing much faster than the speed of processing systems, such as computer systems and other systems that communicate with such networks. Increases in network bandwidth have been a result of new technologies and standards for both wide area networks (WANs) as well as for local area networks (LANs). WAN technologies such as SONET (synchronous optical networks) using DWDM (dense wavelength division multiplexing) have resulted in several orders of magnitude increase in available bandwidth over the span of only a few years. Similarly, LAN technologies such as gigabit Ethernet and ten gigabit Ethernet on copper and optical fiber have increased available network bandwidth by two orders of magnitude relative to standard 10- and 100-megabit Ethernet standards. During the same time period, the computational power of computers and other systems has been doubling about every 18 months. Because of the disparity between the processing speed of communication chips and the bandwidth of underlying network technologies to which they connect, many devices attached to networks cannot exploit the full bandwidth because of the lack of processing power on these devices.
FIG. 1 shows an example of a local area network. The devices on the local area network can include general purpose computers, such as computers 12A, 12B and 12C, as well as storage devices such as network storage devices 13A and 13B, as well as appliances for performing specialized functions, such as data caching and load balancing or other custom processing (see specialized appliances 14A and 14B). The actual communication path, whether by copper wire, optical fiber or wireless, can be implemented in a variety of topologies, such as switches, rings, or buses such as the bus 11 shown for the local area network 10. The local area network typically also includes a link 15 which may be a gateway system to other networks, such as the Internet.
The most common implementation of a local area network in use today is TCP/IP on Ethernet (or IEEE 802.3). TCP is a reliable, connection oriented stream protocol that runs on top of IP which is a packet based protocol. UDP is a datagram oriented protocol running on top of IP. Thus processing systems, such as computer systems in a computer network typically transmit information over the network in the form of packets. A number of different packet based protocols have been defined to enable interconnected network computers to communicate with each other. Generally, the network protocol requires each processing system connected to the network to check, process and route control information contained in each information packet.
An application program which is executing on a computer, such as a general purpose computer which is coupled to the network, may need to send data to another device on the network. In this situation, the application program makes a call to a network protocol stack socket interface, which calls the TCP/IP and the Ethernet drivers, in that order. Data is encapsulated first by a TCP (transmission control protocol) header, subsequently by an IP (Internet protocol) header, and lastly by an Ethernet header as shown in FIG. 2. The application data 21 may be text or graphics or a combination of text and graphics or video/motion pictures or other types of data. As shown in FIG. 2, the TCP header 22 is appended to the application data 21 and then the IP header 23 is appended to the combination of the application data 21 and the TCP header 22. Finally, the Ethernet driver appends an Ethernet header 24A and an Ethernet trailer 24B. After the Ethernet driver has completed the encapsulation process, the entire packet (containing 21, 22, 23, and 24A and 24B) is transmitted over the communication medium of the network, which may be a copper wire, optical fiber, or wireless or other communication media to another device which is coupled to the network. The receiving device goes through the reverse sequence as shown in the graphic 20 of FIG. 2.
The processing of data through a network protocol stack is commonly done by processing systems, such as computer systems which are coupled to the Internet. For example, computer systems at a user's home process data through such a network protocol stack and web servers at web sites perform the same processing. FIG. 3 shows an example of a web site 31 which is coupled to the Internet 32. The web site may be considered to include three groups of processing systems 33, 34, and 35 as shown in FIG. 3. Information from the Internet 32 is received by the routers and processed by the firewall and load balancers and then distributed or transmitted to the web servers or other servers shown in block 34 or provided to the systems in block 35 through a further firewall. In this case, the computer systems must process incoming Internet packets through a network protocol stack such as that described above. Similarly, when a web server or other server or other system in blocks 33, 34 or 35 intend to transmit data through the Internet, then the data must be processed through the network protocol stack such as the stack described above. The actual bandwidth in connection with the transmission of data is a function of the capacity of the communication media (e.g. the optical fiber or other transmission media) as well as the processing throughput of the network protocol stack of the sending and receiving devices.
Web servers and other devices coupled to the network typically have an architecture which is shown in FIG. 4. This architecture includes a bus 53 which is coupled to a host processor or processors 55 and which is also coupled to host DRAM and memory controller 54. The host processor or processors 55 customarily perform the network protocol processing. Ethernet packets are received through the Ethernet interface and framed by an Ethernet MAC (media access controller) integrated circuit 52. The Ethernet MAC integrated circuit transfers the framed Ethernet packets to the host DRAM (dynamic random access memory) generally by performing a direct memory access (DMA) under control of the memory controller and/or interrupting the memory controller. It will be appreciated that the computer system 51 typically also includes associated logic referred to as a “chipset” which performs control functions such as control of the bus 53 and the communication of data among the different components in the system such as peripherals (not shown). The host processor 55 is interrupted by the chipset, and the TCP/IP stack is invoked to examine the Ethernet packets for IP processing and subsequent TCP processing before passing the data to the application layer. An application which is sending data to the Ethernet interface invokes the TCP/IP stack, and the reverse sequence occurs. Thus, in the implementation shown in FIG. 4, the host processor 55, which is typically a general purpose microprocessor or collection of general purpose microprocessors, is performing substantially all of the operations of the system 51 as well as performing the network protocol processing. As a result, the host processor 55, in addition to running the application program which is processing the application data, must also process network packets to perform such operations as fragmenting, reassembly, reordering, retransmission, and verifying of checksums of the packets.
Computer systems with connections to higher bandwidth networks are dedicating hardware to process parts of the network protocol stack. FIG. 5 shows an example of such a computer system with acceleration hardware to offload the network protocol stack processing. The processing system 61 has an Ethernet interface port 62 which is coupled to an Ethernet MAC 63, which in turn is coupled to a network offload accelerator 64. Offload memory 65 is coupled to the network offload accelerator 64. This memory is for storage and retrieval of network packets being transmitted to the Ethernet port 62 or being received from the Ethernet port 62 as part of the processing operation of the network offload accelerator 64. The network accelerator 64 is coupled to the host bus 67 through the host bus bridge 66. Host processor or processors 68 is also coupled to a host bus 67. Host DRAM 70 is coupled to the bus 67 through the host chipset 69 which functions as a memory controller and bus controller for the system. The network offload accelerator 64 may be implemented as a general purpose embedded processor or a custom hardware implementation of a specific network protocol, or a combination of the two. The advantage of the general purpose embedded processor is that if network protocols change, software can be changed to reflect the new protocol and no hardware changes are required. The advantage of a custom ASIC implementation is that it may achieve higher performance or smaller die size. Current generation embedded processors may be used to offload the network protocol stack processing in the architecture shown in FIG. 5 and can achieve a wire rate throughput for 100 megabit Ethernet connections. However, they cannot satisfy wire rate throughput for gigabit Ethernet processing demands.