Networking has become an integral part of computer systems. Advances in network bandwidths, however, have not been fully utilized due to overhead that may be associated with processing protocol stacks. Overhead may result from bottlenecks in the computer system from using the core processing module of a host processor to perform slow memory access functions such as data movement, as well as host processor stalls related to data accesses missing the host processor caches. A protocol stack refers to a set of procedures and programs that may be executed to handle packets sent over a network, where the packets may conform to a specified protocol. For example, TCP/IP (Transport Control Protocol/Internet Protocol) packets may be processed using a TCP/IP stack.
U.S. patent application Ser. No. 10/815,895 describes an accelerated protocol for processing TCP/IP packets. One of the components of this accelerated protocol is the ability to optimize the TCP flow by offloading the data copy from the host to a data movement module (hereinafter “DMM”), such as a DMA (direct memory access) engine. This data copy offload is furthermore overlapped with the protocol processing. However, as protocol processing is further optimized using faster processors, the data copy time may fall behind. As a consequence, the processor stays within the current interrupt utilizing valuable processing power. Furthermore, since the DMM is not polled for data copy completions until the driver completes protocol processing for the current interrupt, and since the application requesting the data won't post new buffers or repost the used buffers until data receives are completed, a significant latency may result from the data copy lag time.