1. Field of the Invention
The invention relates to a technical field of local area networks (LANs) and, more particularly, to a general-purpose Transmission Control Protocol/Internet Protocol (TCP/IP) system for protocol processing engine.
2. Description of Related Art
Currently, the TCP/IP protocol is applied to Internet data accesses widely. Well-developed Internet applications greatly increase applicable servers, such as web servers and media servers. Such servers generally use an operating system to control a CPU on packaging, unpackaging and associated protocol processing of TCP/IP packets. However, the CPU requires a great number of processing cycles for processing the TCP/IP packets as a result of service requests rapidly increasing on the Internet, thereby reducing the performance.
In addition, with rapidly developed network transmission and semiconductor technologies, high-speed 1 Gbps Ethernet and fiber networks are popular. In this case, the operating system is applied to package and unpackage TCP/IP packets and process the associated protocol stack modes, so that the CPU has heavy load and the TCP/IP packet processing consumes most time. FIG. 1 is a schematic graph of CPU performance in different network maximum transmitting units (MTUs). As shown in FIG. 1, if a network server has a CPU clock of 500 MHz, a memory and peripheral bus bandwidth of 370 Mbps and a network maximum transmitting unit (MTU) range of 1.5 to 32 Kbytes, it is clear that the time spent on processing protocol by the CPU is reduced from 90% to 10% after different offloading functions are added when the MTU is at 32 Kbytes. This indicates that a network interface controller (NIC) capable of supporting a Gbps-level and above bandwidth requires not only a wider MTU but also a protocol offloading function for CPU load balance.
FIG. 2 is a schematic graph of a ratio of CPU to NIC bandwidth growth. As shown in FIG. 2, current network transceivers generally increase triple every year in bandwidth while current CPUs generally increase four times every three years in bandwidth, and accordingly a NIC needs to add a number of protocol processing functions, thereby reducing the CPU load and further avoiding an inefficient server or even an unacceptable server. As shown in FIG. 2, under the 1 Gbps bandwidth of a NIC in 2003, a protocol processing is still dealt by a conventional manner, which consumes all efficiencies of the most advanced Intel CPU in 2003. To allow the CPU completely realizing all services previously set by a server, the protocol processing is passed from the CPU to the NIC in order to balance the system load. Namely, to obtain better QoS, the server can apply a TCP/IP protocol offload engine for processing TCP/IP packets such that the CPU can process more service requests in a same time.
However, a typical NIC has not such a protocol offload engine for packaging, unpackaging and associated protocol stack processing of the TCP/IP packets, current 10/100 Mbps Ethernet networks all use an OS-based processing mode. FIG. 3 is a schematic diagram of an OS-based protocol stack processing mode. In a session layer, a payload to be sent is produced by a left-hand application of FIG. 3 in a memory, and next the application uses a system call to transfer the following performance to the OS. The OS divides the payload into segments and prefixes a transport header to each segment in a transport layer (where the TCP locates) to thus form transport packets. The transport packets are sent to a procedure of a network layer (where the IP locates). The procedure optionally performs a fragmentation based on sizes of the packets and an MTU supported by a media access control (MAC) and adds a network header in each fragment to thus form network packets. The network packets are established in a NIC that integrates the MAC and a physical layer to further add a MAC header in each packet and thus form Ethernet packets. The physical layer sends the Ethernet packets to a receiving side through an Ethernet. An unpackaging procedure is performed reversely from the physical layer to the session layer to accordingly restore the payload required by the application. Therefore, the conventional processing mode applies the OS to process the packaging and unpackaging of the transport layer (TCP) and network layer (IP).
Since most NICs in current local area networks can support over 1 G bps bandwidth, to lessen the CPU load on the TCP/IP protocol processing, a NIC has to provide an offload function. FIG. 4 is a schematic diagram of a job sharing of a typical general-purpose TCP/IP protocol offload engine (TOE) and a host, which shifts the protocol layer processed by the NIC to the transport layer. The application uses an OS call to deliver the size of a payload to be sent and its base address to a driver of the TOE. The TOE sends TCP/IP packets completely packaged to the receiving side. A TOE of the receiving side unpackages the packets received and sends the payload obtained and unpackaged to a destination memory area of the application, thereby completing the transfer.
As shown in FIG. 4, it is an ideal sharing way that a job is shared by a TCP/IP protocol offload engine (TOE) and a host, but by a view of cost and economic benefit, due to numerous protocols on Ethernet and other networks, it does not meet with the cost benefit when the typical NIC adds in the TOE. The TOE depends on an application environment to select accelerating items with a higher efficiency to cost ratio. Other protocol processing with a low efficiency to cost ratio still uses the conventional protocol stack processing.
A transmission mechanism is disclosed in U.S. Pat. No. 6,591,302 that assembles a header required for each protocol processing from a protocol stack, combines the header and an appropriate size of data and sends the combined to next network layer. FIG. 5 is a schematic diagram of the transmission process disclosed in U.S. Pat. No. 6,591,302. A session sequencer sends the data to a transport sequencer via a multiplexer. The transport sequencer adds H1 headers in a packet buffer of divided transport segments to form transport packets. The transport packets are delivered to a network sequencer via the multiplexer to thus add H2 headers in the packet buffer of divided network fragments to form network packets. The network packets are delivered to a media access control (MAC) sequencer via the multiplexer to thus add a H3 header in front of each network packet to form Ethernet packets to output.
FIG. 6 is a schematic diagram of the receiving process disclosed in U.S. Pat. No. 6,591,302. The protocol processing in each layer extracts the headers from the packets received and compares them with connection data stored in the protocol stack to find a connection relation with the preceding layer, and the remaining payload data of this transmission is combined and sent to an upper layer of the protocol processing. As shown in FIG. 6, a packet control sequencer confirms the Ethernet packets sent by the MAC sequencer of FIG. 5 that are received. A multiplexer sends the packets received to a network sequencer to remove their H3 headers and combines the remainder and payload packets in a same network layer for further unpacking.
A RISC microprocessor with a protocol offload engine disclosed in U.S. Pat. No. 6,591,302 can speed up protocol offload performance but cannot easily obtain heavy complicated protocol errors and associated error recovery, which leads to a disadvantage of firmware cost increase. The errors occupy a slight part of the entire packets, but it deeply affects the CPU load when certain errors not necessary to interrupt the host CPU cannot be filtered effectively.
A SRAM disclosed in U.S. Pat. No. 6,591,302 is responsible to offload operation and performance. Any packet or data not able to offload immediately is moved from the SRAM to a DRAM having greater room and lower cost in order to wait until all other packets or information is arrived. Next, all information arrived is moved from the DRAM to the SRAM for offload performance. As such, it can simplify the offload engine design and the complexity of internal data path. However, data moved between the SRAM and the DRAM can relatively consume the bandwidth of internal circuits. In addition, the SRAM may lead to a bandwidth bottleneck. For a network bandwidth of 1 Gbps, a full duplex protocol offload engine needs a bandwidth of 2 Gbps, but a high-speed dual-port SRAM costs expensive. Therefore, it is desirable to provide an improved protocol offload engine to mitigate and/or obviate the aforementioned problems.