High-speed digital communication networks over copper, optical fiber and other hybrid media are used in many digital communication systems and storage applications. As these networks continue to evolve in order meet ever-increasing bandwidth requirements, new protocols are being developed to more efficiently transfer information throughout these networks. For example, the well-known IEEE P802.3ae draft 5 specifications describe requirements for 10 Gigabit Ethernet (GbE) applications, which may be used in communication networks and storage area networks (SANs).
Notwithstanding, processing power and memory bandwidth of devices used in applications such 10 GbE have generally lagged behind the increased demand for networking bandwidth and faster data rates. In networks such as the Internet, which utilize the well-known transmission control protocol/internet protocol (TCP/IP), data copying operations utilize a great deal of CPU and memory resources. In addition to diminished processing capacity, there is also increased latency, which may significantly affect time critical applications such as voice applications. A major consequence is that network computing nodes become bottlenecks, which may significantly diminish system performance and throughput.
Some TCP/IP networks typically employ a TCP offload engine (TOE) to facilitate more efficient packet processing. A TOE is an intelligent network adapter that may be configured to offload most or all of the TCP/IP protocol processing from a host processor or CPU to the network adapter. One of the primary challenges associated with building a TCP offload device involves the handling of applications that do not post or allocate buffers before the data is received. Current TCP offload adapters store all their received data locally on the adapter in buffers known as TCP segment buffers. The received data may remain stored in the TCP segment buffers on the adapter until the application posts or allocates a buffer for the data. When the buffer is posted, the adapter copies the data from the on-chip TCP segment buffers to the posted buffers in the host.
FIG. 1 illustrates a block diagram of a conventional TCP offload system 100 that utilizes pre-posted buffers. Referring to FIG. 1, there is shown an application 102 having a plurality of TCP application posted buffers 104, including buffers 104a, 104b, . . . , 104n. Host adapter 106 may include a network interface card (NIC) processor or chip 108 and memory 110. NIC processor 108 may include a parsing/IP-TCP checks process 112 and a TCP re-assembly process 114. The memory 110 for the host adapter 106 may include a plurality of pre-allocated buffers TCP segment reassembly buffers 116, including 116a, 116b, . . . , 116n. Host adapter 106 may receive data 118 from a network.
In operation, application 102 may pre-post the TCP application posted buffers 104, for example 104a. A buffer post handler process or processor, which may be integrated within application 102, may typically post buffers for all the data it expects to receive. Since application 102 knows that data may be received from a specific connection, the buffer post handler process may post buffers before the data is received. Subsequent to the posting of the buffers, NIC processor 108 may receive data 118 packets from the network for a particular connection. The data packets may generally contain an application header followed by a large or small amount of data. In current systems such as system 100, process 112 may be configured to process these data packets for IP and TCP, but no data placement is generally done, other than to store the TCP data segments in the TCP segment re-assembly buffers 116, such as 116a. Process 112 may notify the TCP re-assembly process 114 that more TCP segment re-assembly buffers 116 are available for re-assembly processing.
The TCP re-assembly process 114 may consult TCP application posted buffers 104a to determine which posted buffers are available. In this case, the TCP re-assembly process 114 may find all the posted buffers it needs to store the data. The TCP re-assembly process 114 may then access the TCP re-assembly buffer 116 and read the previously stored information if a consistent stream of TCP bytes are available. The TCP re-assembly process 114 may then write the re-assembled TCP bytes to the TCP application posted buffers 104, in this case 104a. Upon completion of the writing process, TCP re-assembly process 114 may notify application 102 that the posted buffers are now full. At this point, the application may process the complete received command, which includes both header and data.
In high-speed applications, typically of the order of about 10 gigabits per second, the system 100 of FIG. 1 may encounter problems when processing a large volume of data associated with these applications. In order to remain TCP compliant, the host adapter 106 must supply one window size, which is approximately 16 Kbytes for every connection. In a typical case where there may be approximately 1000 connections, about 16 Mbytes of memory would be required. However, high-speed applications such as 10 GbE, require much larger window sizes. In a case where the window size increases to approximately 512 Kbytes for every connection and there are 1000 connections, then approximately 512 Mbytes of memory would be required. Therefore, the memory requirements may become tremendous and may be prohibitively expensive and/or too large to integrate inside a single chip on the host adapter 106.
A similar situation may occur with conventional systems which have non-pre-posted buffers. FIG. 2 illustrates a block diagram of a conventional TCP offload system 200 that may utilize non-pre-posted buffers. Referring to FIG. 2, there is shown an application 202 having a plurality of TCP application posted buffers 204, namely buffers 204a, 204b, . . . , 204n. Host adapter 206 may include a network interface card (NIC) processor or chip 208 and memory 210. NIC processor 208 may include a parsing/IP-TCP checks process 212 and a TCP re-assembly process 214. The memory 210 for the host adapter 206 may include a plurality of pre-allocated TCP segment reassembly buffers 216, namely 216a, 216b, . . . , 216n. Host adapter 206 may receive data 218 from a network.
In operation, application 202 may be configured to receive data from a significantly large plurality of connections. In this case, the pre-posting of TCP application posted buffers 204 would be a waste of resources, since only some of the plurality of connections is active over any given period of time. Application 202 may be configured to issue a “peek” operation to the connection to indicate that the application 202 should be notified when any data is received. The NIC processor 208 may then receive data 218 packets for a specified connection. The data packets generally contain an application header followed by a large or small amount of data. In current systems such as system 200, process 212 may be configured to process these data packets for IP and TCP, but no data placement is generally done, other than to store the TCP data segments in the TCP segment re-assembly buffers 216, such as 216a. Process 212 may notify the TCP re-assembly process 214 that more TCP segment re-assembly buffers 216 are available for re-assembly processing.
The TCP re-assembly process 214 may consult the TCP application posted buffers 204 to determine which posted buffers are available. In this case, the TCP re-assembly process 214 may find that there are no available TCP application posted buffers in which to store the data. Notwithstanding, TCP re-assembly process 214 may recognize that that a “peek” request was made. An indication for the data may subsequently be forwarded to the application 202. Application 202 may then, normally post TCP application buffers to handle the received data. This action may dispatch a message to the TCP re-assembly process 214 to indicate that new buffers are available. The TCP re-assembly process 214 may then access the TCP re-assembly buffer 216 and read the previously stored information if a consistent stream of TCP bytes are available. The TCP re-assembly process 214 may then write the re-assembled TCP bytes to the TCP application posted buffers 204, in this case 204a. Upon completion of the writing process, TCP re-assembly process 214 may notify application 202 that the posted buffers are now full. At this point, the application may process the complete received command, which includes both header and data.
In high-speed applications, the system 200 of FIG. 2 may encounter the same problems as the system 100 of FIG. 1, when processing a large volume of data associated with these applications. The problems are aggravated because the application 202 postpones posting of TCP application buffers until after received data has been indicated such that the buffers space on chip 216 is routinely used up to the full window size for a large number of connection.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.