1. Field of Invention
This invention pertains generally to data communication over a computer network, and more particularly to a method and apparatus for controlling transport control protocol congestion based on bandwidth estimation techniques.
2. Description of Related Art
The use of the transport control protocol/internet protocol (TCP/IP) to facilitate the transmission of information between two or more computer systems via one or more networks is well known. When a given network computer wishes to exchange information with another, a bi-directional data flow develops to allow information to be transmitted from one computer and received by the other computer. Typically, the information is distributed across a sequence of packets to simplify the transmission process and facilitate fast error detection and correction. The TCP/IP protocol suite ensures that the information to be transferred is properly segmented and sent from the transmitting computer as packets, as well as properly received and assembled into the complete data file at the receiving computer.
A number of terms utilized within the application are now described. The term packet will be used herein to collectively refer to blocks of information, such as within a packet stream. Packets as broadly referred to herein are inclusive of all information units, including headers, used to transport data and/or control information between nodes of the network. A data object to be sent is divided into a sequence of packets. Typically the packets are sent sequentially based on their position in the original data object. When sequential packets are communicated one after another in sequence they are referred to as being “back-to-back” packets, since they are sent in a single burst and the sequence is not broken by the communication of other forms of packets, such as according to retransmitting in response to packet errors. If sufficient bandwidth exists larger numbers of packets should be sent back-to-back. A segment is considered herein to comprise the data portion of any TCP/IP data packet or acknowledgment packet, and may have a size up to the maximum segment size (MSS) value in bits. The MSS is considered to be the size of the largest segment that the sender can transmit. This value can be based on the maximum transmission unit (MTU) of the network, the path MTU discovery algorithm, receiver maximum segment size (RMSS) or other factors. The segment size is not considered to include the TCP/IP headers and options. Congestion window, cwnd, is considered to comprise a TCP state variable that limits the amount of data a TCP can send. Data having a sequence number higher than the sum of the highest acknowledged sequence number and the minimum of cwnd is not to be sent over the TCP.
As is well known, the transmission control protocol (TCP) corresponds to the transport layer (layer 4) of the open system interconnection (OSI) reference model. The transmission control protocol generally provides stream data transfer, multiplexing, full duplex operation, segmentation and reassembly, along with efficient flow control.
The internet protocol (IP) is a network layer (layer 3) protocol that generally provides addressing information and some control information that enables packets to be routed. The IP protocol has two primary responsibilities: providing connectionless, best-effort delivery of datagrams to a network, and providing fragmentation and reassembly of datagrams to support data links with different maximum transmission units (MTU) sizes. Together, these two protocols form the core of the internet protocol suite that enables reliable delivery of data via a network.
When two computers communicate via a computer network using TCP/IP protocol, a data structure known as a transmission control block is typically utilized to facilitate data transmission, segmentation, reassembly, retransmission, acknowledgments, and the like. The transmission control block is used to track various parameters associated with the data transmit and receive process for a given data flow.
In packet communication based on TCP/IP, a host for transmitting data generally divides the data into a plurality (sequence) of segments. The host typically adds header information to the segments, such as a transmission source address or destination address, and sends the resultant packet to a network. At this time, the maximum packet length (MTU) transmittable from each host to a network is determined by the MTU supported by the protocol of the data link layer of a network connected to the host for exchanging data.
If the protocol of the transport layer is TCP, the maximum data length which can be contained in each packet is referred to as the maximum segment size (MSS). According to the IETF (Internet Engineering Task Force) standard RFC 879 “The TCP Maximum Segment Size and Related Topics”, the MSS value is determined by subtracting a default IP header length and TCP header length from the above mentioned MTU.
When transmitting and receiving hosts are connected by the same data link, the most efficient data transmission method is to divide transmission data and transmit packets. Early in the development of the TCP/IP protocol, it was discovered that some control over the manner in which packets were injected into the network by the source host was needed to help with the problem of dropped packets.
Originally, the well known TCP protocol allowed a source to inject multiple packets into a network, up to a limit corresponding to a window or buffer size advertised by the receiver. In essence, the TCP source is allowed to send a number of packets equal to the congestion window size, which is generally referred to as the parameter “cwin” in the TCP standard. The TCP source then stops and waits for acknowledgments (ACKs) before resuming transmission. When the value of cwin is high, the TCP source manages to transmit several packets before feedback from the TCP receiver. When cwin is low, the opposite is true. The limited accuracy of bandwidth estimates has also curtailed any advantages which could arise from increasing packet train length.
Although such a windowing scheme may work for cases in which the source and the receiver are connected to the same network; problems were soon found in the case of routers disposed between the source and receiver having finite buffer sizes. The routers in this scenario would quickly run out of space to hold the incoming packets. To combat this problem a “slow start” procedure was developed in which the source limits the rate at which it injects new packets into the network according to the rate at which acknowledgments of successful receptions are returned by the receiver.
The slow start mechanism is beneficially utilized when transmissions are to commence on a network having unknown conditions. This mechanism provides for slowly probing the TCP to determine the available capacity, in order to avoid congesting the network with an inappropriately large burst of data. The slow start mechanism is utilized for this purpose at the beginning of a transfer, or after repairing loss detected by the retransmission timer.
Consequently, in addition to cwin, another congestion control parameter was introduced in TCP as the so-called “Slow Start Threshold”, or ssthresh. This parameter is also used in setting the sending rate of the TCP source. In particular, ssthresh controls the rate of increase of the sending rate when feedback from the TCP receiver is positive. The parameter ssthresh has significant impact on network congestion control, however, it does not provide a precise and effective congestion control for packets in the TCP receiver side.
These current TCP congestion solutions provide a scheme of modulating the bandwidth of traffic streams transmitted across a congested network. By modulating the bandwidth of traffic streams, feedback to packet origin points, congestion avoidance processes or algorithms is provided via acknowledgment delays from the sending node. That is, the time at which such acknowledgments are received at the receiving nodes are increased. This impacts the rate at which new packets are transmitted between the receiving node and the sending node in a way to rescue the overall packet loss. The current TCP specification provides two solutions for generating such back-to-back transmission estimates: (a) estimating the number of back-to-back packets based on the amount of acknowledged data contained in the ACK packets, and (b) estimating the number of back-to-back packets based on using a timestamp option (requiring about 12 extra bytes per packet) wherein the transmission time of the packets is communicated to the receiver from which back-to-back transmissions can be generally inferred.
These approaches suffer from many drawbacks. Estimating back-to-back packets from the amount of data found in the ACK packets cannot be relied upon, since the sender may delay packet transmission (i.e. application requirements, Nagle algorithm, and so forth).
The use of timestamps requires a large amount of extra overhead per packet, and leads to increasing packet fragmentation. Furthermore, timebases for the virtual clock are often insufficiently precise to correctly detect back-to-back transmissions. Many conventional implementations also increase the value of the virtual clock for timestamps by 1 once every 100-500 msec which is insufficient to detect back-to-back transmission.
Therefore, a need exists for providing a robust and accurate form of bandwidth estimation for use in performing receiver-side TCP congestion control, and for controlling the length of packet trains. The present invention fulfills that need and others and overcomes the drawbacks of prior congestion control approaches.