Conventional TCP processing is exemplified by systems and methods developed to accelerate data transfer between a client and a server. Software implementations executed on a host processor, e.g., Central Processing Unit (CPU), are comparatively inexpensive and slow compared with expensive dedicated hardware implementations designed to offload TCP processing from the host processor.
FIG. 1 is a block diagram of an exemplary embodiment of a prior art computer system generally designated 100 including a CPU 110 and a Network Interface Card (NIC) 150. Computing System 100 may be a desktop computer, server, laptop computer, palm-sized computer, tablet computer, game console, cellular telephone, computer based simulator, or the like. A Bus 112 coupling CPU 110 to a System Controller 120 may be a front side bus (FSB). Accordingly, Computing System 100 may be a hub-based architecture, also known as an INTEL® hub architecture, where System Controller 120 is a memory controller hub and an I/O Bridge 140 is coupled to System Controller 120 via a Hub-to-hub Interface 126. System Controller 120 is coupled to System Memory 130 via a Memory Bus 132. I/O Bridge 140 includes a controller for Peripheral Component Interface (PCI) Bus 182 and may include controllers for a System Management Bus 142, a Universal Serial Bus 144, and the like. I/O Bridge 140 may be a single integrated circuit or single semiconductor platform. Examples of System Controller 120 known in the art include INTEL® Northbridge. Examples of I/O Bridge 140 known in the art include INTEL® Southbridge or an NVIDIA® Corporation Media and Communications Processor chip.
NIC 150 may share PCI bus 182 with one or more PCI Devices 180. NIC 150 includes a PCI Interface 175, a Dedicated Processor 155, a Medium Access Controller (MAC) 165, Dedicated Memory 160, and an ETHERNET Interface 170 to interface to an ETHERNET Network 172. Software Driver 119 for NIC 150 communicates between NIC 150 and Application Program 117 executing on CPU 110. An Application Memory Space 125, a TCP Stack Memory Space 145, and a Driver Memory Space 135 are allocated within System Memory 130.
Dedicated Processor 155 within NIC 150 is used for TCP processing in lieu of having CPU 110 execute TCP Stack 115 to perform TCP processing. Therefore NIC 150 offloads CPU 110, freeing CPU 110 processing cycles for other applications. Likewise, Dedicated Memory 160 replaces TCP Stack Memory Space 145, freeing TCP Stack Memory Space 145 for allocation to other applications. However, NIC 150, including Dedicated Memory 160 and Dedicated Processor 155 is more costly than a software implementation for TCP processing executed on CPU 110. Furthermore, conventional embodiments of NIC 150 typically have some performance limitations. For example, when space is not available in Driver Memory Space 135 and Dedicated Memory 160 has filled, an incoming frame is not accepted by NIC 150 resulting in a reduction in available receive data bandwidth. An incoming frame may also not be accepted by NIC 150 when Dedicated Memory 160 is full and the incoming frame rate exceeds the rate necessary to upload frame data from Dedicated Memory 160 to Driver Memory Space 135 via I/O Bridge 140.
Additionally, Application Program 115 is notified when uploaded frame data is available in Driver Memory Space 135. Application Program 115 then copies the uploaded frame data from Driver Memory Space 135 to Application Memory Space 125, during which time Application Memory Space 125 and Driver Memory Space 135 may be inaccessible for other operations.
NIC 150 transmits acknowledgements (ACKs) confirming each frame has been received. Timely transmission of ACKs minimizes unnecessary retransmissions resulting from expiration of a transmit timer maintained by the sender. Timely transmission of ACKs also assures that a receive window, indicating how much data may be sent to NIC 150, remains open. In contrast, a conventional software implementation for TCP processing executed on CPU 110 typically requires longer to generate an ACK, resulting in unnecessary retransmissions and possibly closure of the receive window.
Therefore, there is a need for a partial hardware implementation that optimizes TCP processing by offloading some tasks from a host processor while timely transmitting ACKs.