1. Field of the Invention
This invention relates generally to methods used in implementation of the network Transmission Control Protocol, and more specifically to congestion control methods used in the Transmission Control Protocol.
2. Description of Related Art
The network Transmission Control Protocol (TCP) is well known. TCP runs only in end systems 100, 110 on a network 120 and not in intermediate network elements such as routers or bridges. The intermediate network elements do not maintain TCP connection state. A TCP connection between end systems 100, 110, sometimes called hosts 100, 110, provides full duplex data transfer between hosts 100, 110. A TCP connection is always point-to-point, i.e., between a single TCP sender and a single TCP receiver.
When a TCP connection is established, two application processes 101, 111 can send data to each other. Data is transmitted in segments. A segment includes header fields and a data field. The data field contains application data. The amount of application data that can be placed in the segment data field by the TCP sender is a maximum segment size. The TCP sender monitors flow control by monitoring an advertised receive window rwnd for the TCP receiver, which is the size of the TCP receive buffer.
Two header fields of interest are the sequence number field and the acknowledgement number field. As is known, in TCP, data is viewed as an unstructured, but ordered, stream of bytes. The sequence numbers used in TCP are over the stream of transmitted bytes. The sequence number that is placed in the segment number field is the byte-stream number of the first byte in the segment.
The acknowledge number placed in the acknowledgement number field, for example by host 110 in an acknowledgement packet being transmitted to host 100, is the sequence number of the next byte of data that host 110 is expecting from host 100. Since according to TCP, host 110 only acknowledges bytes up to the first missing byte in the data stream, host 110 may provide cumulative acknowledgements of a single segment when segments are received out of order, the cumulative acknowledgements for the single segment are referred to as duplicate acknowledgements.
Each time host 100 sends a segment into a TCP connection, a timer is started. If the segment timer expires before host 100 receives an acknowledgement for the data in the segment from host 110, host 100 resends the segment and initiates a slow start process. Typically, the timeout value for the timer is not much larger than a round-trip time between hosts 100 and 110. The round-trip time is the time from when a segment is transmitted to when an acknowledgement for that segment is received.
A segment can fail to reach host 110 due to data corruption or more usually as a result of network congestion. Most losses on the Internet are caused by congestion as routers run out of buffers and discard incoming traffic.
TCP typically runs on top of the Internet Protocol (IP). TCP must use end-to-end congestion control because the IP layer provides no feedback to end systems concerning network congestion. Four TCP congestion control processes are described in Request for Comments(RFC) 2581. The four are slow start, congestion avoidance, fast retransmit, and fast recovery.
Typically, for slow start, a slow start threshold ssthresh is maintained and for congestion avoidance, a congestion window cwnd is maintained for each connection. Congestion window cwnd is a sender side limit on the amount of data that the TCP sender can transmit into network 120 before receiving an acknowledgement.
When a TCP receiver 110 receives an out-of-order segment, TCP receiver 110 generates an immediate acknowledgement for the first missing segment, which is a duplicate acknowledgement. Since TCP sender 100 does not know whether the duplicate acknowledgement is caused by a lost segment, a reordering of the segments during transmission, or replication of an acknowledgement packet or a segment by network 120, TCP sender 100 waits to receive a small number of duplicate acknowledgements, typically three, before assuming a segment is lost. If a reordering of the segments is the problem, there are only one or two duplicate acknowledgements before the reordered segment is processed and a new acknowledgement generated.
When the third duplicate acknowledgement is received, the TCP sender immediately retransmits the oldest unacknowledged segment without waiting for the segment transmission timer to expire. This is known in the art as a fast retransmit.
Following the fast retransmit, fast recovery, but not slow start, is performed. Fast recovery governs the transmission of data by TCP sender 100 until a non-duplicate acknowledgement is received.
When one or more segments are lost in network 120, network 120 is assumed congested by TCP sender 100.
After recovery from the loss, TCP's congestion avoidance mechanism arranges to continue transmission at half the rate, which was obtained previously. The objective of fast recovery is to maintain transmission (at the reduced rate) while the recovery is being performed.
In fast recovery, the size of congestion window cwnd is reduced and slow start threshold ssthresh is set equal to the reduced size congestion window. As soon as is possible with the reduced size congestion window, another packet is transmitted over the network in response to each additional duplicate acknowledgement packet received from TCP receiver 110, and the size of congestion window cwnd is increased by the size of one segment. The principle behind this is that each duplicate acknowledgement implies the receiver has received a segment although the sender does not know which one. Since this means that a segment has left the network, the sender can insert one more segment without worsening the congestion.
When a non-duplicate acknowledgement is received by TCP sender 100, congestion window cwnd is deflated by setting window equal to slow start threshold ssthresh, and the number of duplicate acknowledgements is set to zero. If only a single segment is lost in a round trip time, fast retransmit and fast recovery perform satisfactorily and data transmission continues at the reduced rate in an attempt to avoid congestion on network 120.
However, fast recovery does not always result in a smooth recovery when there are multiple lost segments in a single round trip time. Since the number of duplicate acknowledgements is set to zero upon leaving fast recovery, TCP sender 100 must receive three new duplicate acknowledgements after termination of fast recovery before TCP sender 100 can determine that another drop has occurred and perform another fast retransmit. If the reduced size congestion window cwnd is such that less than three new segments can be transmitted by TCP sender 100, TCP receiver 110 does not generate three duplicate acknowledgements and a retransmission timeout occurs. Similarly, if the new drop occurs almost a round trip time after the first drop, there may not be sufficient time available for generation of three duplicate acknowledgements, and again a retransmission timeout occurs.
As network congestion increases, and multiple drops in a single round trip time occur, the performance of network 120 is degraded because fast recovery cannot handle multiple drops in a single round trip time. Hence, a new method is needed for recovering from such a situation.