1. Field of the Invention
The invention relates to network communications and more specifically to the management of data transfer rates under TCP/IP protocols.
2. Background of the Invention
As illustrated in prior art FIG. 1, a computer network, generally designated 100, includes a plurality of receivers 110, each connected through communications channels 115 to a network 120. Network 120 is also connected through communications channels 115 to a sender 130 and an intermediary 140. Intermediary 140 serves as a transmission node or cache system between various parts of network 120, sender 130, or receivers 110. In some cases, receiver 110 and sender 130 are a network client and network server respectively.
Communications between the elements of computer network 100 are typically managed using a layered series of software and hardware systems. Prior art FIG. 2 illustrates some of the various layers involved when computer network 100 includes the World Wide Web. A top layer includes FTP/HTTP 210 (File Transfer Protocol and Hypertext Transfer Protocol). These standard components serve as the interface between lower levels of the communication system and application programs such as browsers. FTP/HTTP 210 operates above a TCP (Transmission Control Protocol) 220 layer that facilitates the delivery and reception of data packets to and from devices in computer network 100. IP (Internet Protocol) 230 layer, data link 240 layer, and physical network 250 perform well known standard operations enabling the transfer of data.
TCP 220 includes software running on both sender and receiver 110 devices. TCP 220 hides the details pertaining to the network and its characteristics from FTP/HTTP 210 and communicates with FTP/HTTP 210 through a simple TCP application programming interface (API). TCP 220 has three roles: (a) it prevents congestion at the receiver, (b) it prevents congestion in the network along the path used from source to destination, and (c) it guarantees a reliable data transfer.
At sender 130, buffers in TCP 220 receive data to be transferred from applications at FTP/HTTP 210. These buffers are referred to as the “TCP Send Buffers.” Likewise, at receiver 110, there are “TCP Receive Buffers” associated with each sender 130, from which applications on receiver 110 read data transferred over network 120. A pair of send and receive buffers exists for each TCP connection. Data are written to and read from the send and receive buffers in blocks of various sizes.
FIG. 3 illustrates a prior art receive buffer 310 associated with TCP 220 on a receiver 110. Receive buffer 310 contains data 320 and has a limited available space of a size, known as the receive window Wr 330, where additional data could be stored. TCP 220 uses an explicit feedback mechanism to prevent congestion at receiver 110 and overflow of receive buffer 310. Receiver 110 declares the size (number of bytes) of receive window Wr 330 in each acknowledgement sent back to sender 130. Sender 130 never allows the amount of outstanding data transmitted and not yet acknowledged to exceed Wr 330.
To prevent congestion along a communications path, TCP 220 follows a congestion control algorithm that limits the amount of outstanding data. The congestion control algorithm contends with the fact that no information is available regarding the available bandwidth of the path from sender 130 to receiver 110. The congestion control algorithm starts by allowing a small amount of outstanding data and then increases the amount of outstanding data allowed as acknowledgements are received from receiver 110. The amount of outstanding data allowed by the congestion control algorithm is referred to as the congestion window (Wc). The actual amount of data that can be sent is the send window (Ws), which is equal to the minimum of Wr 330 and Wc.
To illustrate the dynamics of the congestion window Wc, one may momentarily assume that there is an abundance of data to send and that receive window Wr 330 is very large, and therefore does not place any limitation on the size of the send window Ws. The congestion control algorithm starts with a small congestion window (Wc) equal to the maximum size of a single TCP segment. The first transmission includes only one TCP segment, carrying at most an amount of data equal to the maximum segment size (MSS). When an acknowledgement of the first segment is returned from receiver 110, the congestion window Wc is increased by another MSS to two times MSS. Since, for this illustration, Wr 330 is assumed to be very large, the send window Ws also becomes two times MSS. The maximum amount of data that can be sent in the next transmission (Ws) is now two times MSS. When two segments are sent, each is acknowledged independently by receiver 110 and as each acknowledgement is received the congestion window Wc is increased by MSS. After both acknowledgements are received from a two-segment transmission, Wc (and Ws) will equal four times MSS. By increasing the size of the congestion window by one MSS for each acknowledgement received, the number of segments in each transmission doubles as each previous transmission is fully acknowledged. This increase continues until the congestion window Wc reaches a certain value called the “threshold.” Beyond the threshold, the congestion window Wc is incremented by only one MSS for each set of transmissions. This phase of the congestion control algorithm is referred to as the congestion avoidance phase. The congestion window Wc keeps increasing, albeit at a smaller (linear) rate instead of an exponential rate, until the size of the congestion window Wc reaches a preset maximum. This maximum is a predetermined system parameter. Since the congestion window Wc increases over time, the likelihood of congestion and, therefore, loss of a packet on the network also increases over time. If a TCP segment is lost in the network, the lack of acknowledgement from receiver 110 within a timeout period triggers the retransmission of the lost segment and all segments following it and also resets the size of congestion window Wc (and Ws) to MSS. The process described above is then repeated. TCP 220, therefore, does not maintain a constant transmission rate, but rather transmits data using a congestion window that increases according to first an exponential rule and then a linear rule until a transmission loss occurs and Wc is reset.
FIG. 4 is a prior art illustration of congestion window Wc size as a function of time. During an initial phase 410 the congestion window Wc increases exponentially, from an initial size of one MSS, until it reaches a threshold (Thr 430). Wc then increases linearly during a second phase 440, until at time 450 a segment (data packet) is lost at which time Wc is reset to one MSS and the process is repeated. Time 450 can occur at any point in each cycle. Wc will stop increasing if a maximum size (max 480) is reached before a data segment is lost. The loss of data segments is dependent on the bandwidth of network 120 and the constantly variable traffic it supports. The value of max 480 is a system parameter associated with elements on network 120. After a packet is lost, a new threshold (ThrB 420) is set as a function of the largest window size reached on the previous cycle. The cycles are repeated until all data transmission is completed or a timeout occurs.
When Wr 330 is finite the send window, Ws, is determined by Wr 330 if Wr 330 is less than Wc. Ws is always the minimum of the current value of Wc and the most recently declared value of Wr 330.
A method of controlling the rate of data transfer is to limit the rate at which the application at sender 130 writes data to the send buffer. This is accomplished by limiting the size and frequency of data blocks written to the send buffer. Writing blocks of small size and high frequency can represent a high overhead at sender 130. Since sender 130 is typically a server serving a large number of clients, the preference of sender 130 is to use large blocks, and to control the rate by controlling the frequency of writes. However, writing large size blocks gives TCP 220 an opportunity to send large batches. Large batches generate traffic of a bursting nature and thus are more likely than traffic of a steady nature to cause congestion in network 120.
The prior art systems described above have a number of disadvantages. The variable send window Ws implies that an optimal transmission rate is never maintained. The average transmission rate also varies as a function of total traffic on the network. This variability makes scheduling of large data transfers difficult and allows large transfers to have a significant impact on other data. To be practical any solutions to these problems must be made utilizing the current standardized protocols.
Two prior art methods of regulating transfer rates involve intercepting the acknowledgement sent from receiver 110 to sender 130. In the first method the acknowledgement is delayed for a period of time and in the second the value of Wr reported within the acknowledgement is modified by inserting a new value. Both of these approaches have significant disadvantages, including the necessity of intercepting packets in the network.
Further information about TCP/IP and the state of the art in data transfer methods is found in the following references:
W. Richard Stevens, “TCP/IP Illustrated. Vol I—The Protocols,” Addison-Wesley, 1994;
Comer, Douglas, “Internetworking with TCP/IP. Vol. I.,” Prentice Hall, 1991;
Comer, Douglas, and Stevens, David, “Internetworking with TCP/IP. Vol. II.,” Design, Implementation, and Internals. Prentice Hall, 1991; and
Packer, Robert L., “Method for Explicit Data Rate Control in a Packet Communication Environment Without Data Rate Supervision”, U.S. Pat. No. 6,038,216.