The Transmission Control Protocol/Internet Protocol (TCP/IP) suite is the most widely-used transport protocol in digital packet networks today. TCP is a connection-oriented, end-to-end, full-duplex protocol, which provides for reliable inter-process communication between pairs of processes in host computers. The information exchanged between TCP peers is packed into datagrams known as segments, each comprising a TCP header followed by payload data. The segments are transported over the network in IP packets. TCP is described by Postel in RFC 793 of the U.S. Defense Advanced Research Projects Agency (DARPA), entitled “Transmission Control Protocol: DARPA Internet Program Protocol Specification” (1981), which is incorporated herein by reference.
The key elements used by TCP for maintaining efficient, reliable communications are its acknowledgment and window mechanisms. These mechanisms are described in detail in RFC 793 and are further analyzed and optimized by Clark in DARPA RFC 813, entitled “Window and Acknowledgement Strategy in TCP” (1982), which is also incorporated herein by reference. As explained by Clark, when data arrives at the recipient, TCP requires that the recipient send back an acknowledgment (ACK) of the data. When the sender does not receive the ACK within a certain period of time, it retransmits the data. TCP specifies that the bytes of data are sequentially numbered, so that the recipient can acknowledge data by naming the highest numbered byte of data it has received, which also acknowledges the previous bytes. RFC 793 contains only a general assertion that data should be acknowledged promptly, but gives no more specific indication as to how quickly an acknowledgement must be sent, or how much data should be acknowledged in each separate acknowledgement.
The window mechanism is a flow control tool. Whenever appropriate, the recipient of data returns to the sender a number, which is (more or less) the size of the buffer that the receiver currently has available for additional data. This number of bytes, called the window, is the maximum that the sender is permitted to transmit until the receiver returns some additional window. Sometimes the receiver will have no buffer space available and will return a window value of zero. Under these circumstances, the protocol requires the sender to send a small segment to the receiver now and then, to see if more data can be accepted. Again, RFC 793 does not specify under what circumstances the window should be increased, or how the sender should respond to such revised information.
A number of authors have suggested strategies for enhancing the efficiency of TCP-based communications. Among these authors is Clark, who in the above-mentioned RFC 813 identifies a degeneration of communication throughput that can occur during long TCP data transfers. He calls this phenomenon the Silly Window Syndrome (SWS) and offers a number of algorithms that can be used to overcome it. For example, the sender of the data can compare the size of the window offered by the receiver to the size of its own usable window, which is the offered window minus the amount of outstanding, unacknowledged data that the sender has transmitted. The usable window in conventional TCP implementations is always smaller than the offered window. If the ratio of the usable window to the offered window size drops below a given fraction, the sender can conclude that SWS has occurred. Under these conditions, the sender should stop transmitting until the usable window size has increased.
Allman et al. describe methods for improving TCP performance under conditions of network congestion in RFC 2581 of the Internet Engineering Task Force (IETF) Network Working Group, entitled “TCP Congestion Control” (April, 1999), which is incorporated herein by reference. These methods include slow start and congestion avoidance algorithms, which are used by a TCP sender to control the amount of outstanding data being injected into the network, and fast retransmit/fast recovery algorithms, used to detect and repair segments lost in transmission.
To implement the slow start and congestion avoidance algorithms, a number of variables are added to the TCP per-connection state. The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an acknowledgment (ACK), while the receiver's advertised window (rwnd) is a receiver-side limit on the amount of outstanding data. The minimum of cwnd and rwnd determines the amount of data that the sender can transmit at any given time. Another state variable, the slow start threshold (ssthresh), is used to determine whether the slow start or congestion avoidance algorithm is used to control data transmission, as described below.
The slow start algorithm is used to slowly probe the network to determine the available capacity at the beginning of a transfer, or after repairing loss detected when an ACK is not received within the required timeout period. The value of cwnd is set to a small initial window (IW) value, and is then gradually incremented each time an ACK is received. When cwnd increases above ssthresh, the sender switches over to the congestion avoidance algorithm, whereby cwnd is incremented by a full segment per round-trip time (RTT) of the connection. As an approximation to this criterion, cwnd is typically incremented on each incoming, non-duplicate ACK by an amount given by SMSS*SMSS/cwnd, rounded up to the nearest byte, wherein SMSS (sender maximum segment size) is the size of the largest segment that the sender can transmit.
The fast retransmit/fast recovery algorithm specifies that a TCP receiver should send an immediate duplicate ACK when an out-of-order segment arrives. The purpose of this ACK is to inform the sender that a segment was received out-of-order and which sequence number was expected. Three duplicate ACKs (four identical ACKs without the arrival of any other intervening packets) are treated by the sender as an indication that a segment has been lost. Thus, after the sender receives three duplicate ACKs, it retransmits what appears to be the missing segment, without waiting for its own retransmission timer to expire. After the fast retransmit algorithm sends what appears to be the missing segment, the fast recovery algorithm governs the transmission of new data, possibly at a reduced rate, until a non-duplicate ACK arrives.
For the past twenty years, TCP/IP has been implemented as a software suite, typically as a part of computer operating systems. In software implementations, the receiver and transmitter processes carried out by each of the peers (such as sending data, receiving ACKs and updating window sizes) are serialized, due to the nature of program execution in general-purpose microprocessors. This serialization introduces a certain delay in transmission, since ACKs and window size changes must be processed before the transmitter can decide how much more data it should send. As long as network speed was the main factor limiting transmission rates, the TCP/IP processing delay was insignificant. With network speeds now increasing to the Gbps range, however, this is no longer the case, and faster TCP/IP processing is required.
In an attempt to clear the TCP/IP bottleneck, hardware-based protocol processors have been developed. Yet the speed with which these processors can transmit TCP segments is still held back by the serial nature of existing methods for synchronizing data transmission, ACK reception and window size adjustment. There is thus a need for a TCP/IP transmitter that is capable of transmitting at full wire speed without serialization delay for as long as it has data to transmit, while still maintaining the desirable reliability, congestion avoidance and retransmission/recovery features of the classic software-implemented protocol.