This invention relates to digital data communication and techniques for improving throughput of channels subject to large bandwidth-delay product, such as satellite-based communication links and mobile data networks.
It is known that the de facto standard for transmitting data over networks, Transmission Control Protocol (TCP), does not perform well in satellite or mobile communication networks due to the very large bandwidth-delay product (BDP) of the channel. For example, a communication satellite having channel bandwidth capacity between 24 Mbps and 155 Mbps. has a round-trip-delay (RTT) of 500 ms, so the lowest bandwidth of 24 Mbps will lead to a bandwidth-delay product of 1.5 MB—a value far exceeding the maximum advertised window size of TCP of 64 KB. In this case TCP's flow control mechanism will limit the throughput to no more than 1 Mbps, which is less than 5% of the satellite's link-layer bandwidth capacity, as hereinafter explained.
TCP Flow Control
TCP has a built-in flow control mechanism that is designed to prevent a fast sender from overflowing a slow receiver. It works by reporting the receiver's buffer availability, i.e., the advertised window, back to the sender via a 16-bit field inside the TCP header so that the sender can prevent sending more data than the receiver's buffer can store. Computer processing power has grown tremendously such that computers can now easily keep up with the arriving stream of data up to hundreds of Mbps data rate. Thus an arrived packet will quickly be retrieved by the application from the receiver buffer, and in most cases this can be completed even before the next packet arrival. As a result, the reported advertised window (AWnd) simply stays at the maximum receiver buffer size. In such cases TCP's flow control mechanism is not activated at all, as it is not needed.
However TCP's flow control mechanism can become the performance bottleneck in networks with a large bandwidth delay product (BDP). Consider the scenario where a sender is connected to a receiver over a large BDP link (100 Mbps, 250 ms one-way delay, BDP=50 Mb). Ignoring processing time, when the sender receives an acknowledgement (ACK), the reported advertised window (AWnd) size is in fact the value 250 ms prior to current time (i.e., 34 KB). During this time, the receiver application could have retrieved additional data from the receiver buffer, thereby freeing up more buffer space (i.e., 64 KB).
Due to the delayed AWnd, the sender cannot send more than the reported AWnd and thus cannot make use of the extra buffer space available at the receiver. In cases where the BDP is larger than the maximum AWnd, the sender will operate in a stop-and-go manner resulting in severe underutilization of the network channel.
The conventional solution to the above problem is to make use of TCP's Large Window Scale (LWS) extension as defined in the TCP protocol Request For Comments RFC 1323. This extension allows TCP to negotiate during connection setup a multiplier to apply to the window size so that a window size larger than 64 KB can be used. However, this approach relies on two assumptions: First, either the operating system or the application needs to be modified to explicitly make use of TCP's LWS extension. Second, there must be a way for the application to request the use of LWS during connection setup.
While these two assumptions can be easily satisfied in the laboratory where custom network applications and operating systems can be developed to exploit TCP's LWS extension, they will likely prevent the vast amount of network applications already available in the Internet to benefit from TCP's LWS extension.
Much research has been done to improve the performance of TCP in certain large networks. The existing research is classified in three categories: modifying both sender and receiver; modifying the sender only; and modifying the receiver only. Each of the categories is briefly characterized here by way of background.
Sender-Receiver-Based Approaches
Jacobson et al. “RFC 1323: TCP extensions for high performance,” May 1992, RFC1323 proposed the Large Window Scale (LWS) extension to TCP which is currently the most widely supported solution. It works by scaling the advertised window (AWnd) by a constant factor throughout the connection. With the maximum LWS factor 14, the maximum AWnd can be increased up to 1 GB ((216−1)*214≈230). Alternatively, the application can be modified to initiate multiple TCP connections in parallel to increase throughput by aggregating multiple TCP connections, as described in Lee, D. Gunter, B. Tierney, B, Allcock, J. Bester, J. Bresnahan and S. Tuecke, “Applied Techniques for High Bandwidth Data Transfers Across Wide Area Networks,” Proceedings of International Conference on Computing in High Energy and Nuclear Physics, September 2001 and H. Sivakumar, S. Bailey and R. Grossman, “PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks,” Proceedings of Super Computing, November 2000. This approach effectively multiplies the AWnd and the congestion window (CWnd) by the number of TCP flows and so can mitigate the AWnd limitation. However, aggregating multiple TCP connections will also allow the application to gain an unfair amount of bandwidth from competing TCP flows. Hacker et al. (T. Hacker, B. Noble and B. Athey, “Improving Throughput and Maintaining Fairness using Parallel TCP,” Proceedings of IEEEInfocom 2004, March 2004) solved this problem by deferring CWnd increase until multiple acknowledgements (ACKs) are received so as to compensate for the larger window size.
Sender-Based Approaches
Apart from AWnd limit, the congestion window maintained by the sender may also limit throughput of TCP in large BDP-type networks. Specifically, the growth of the CWnd is triggered by the arrival of the ACKs. Thus in a long delay path it may take a longer time for the CWnd to grow to sufficiently large value so that the link bandwidth can be fully utilized.
To address this problem Allman et al. (M. Allman, S. Floyd and C. Partridge, “RFC 3390: Increasing TCP's Initial Window,” October 2000) proposed in RFC3390 to initialize the CWnd to a larger value (as opposed to one TCP segment) so that it can grow more quickly in large delay networks to ramp up TCP's throughput. Since then, much effort had been out into developing more sophisticated congestion control algorithms such as CUBIC (I. Rhee and L. Xu “CUBIC: A new TCP-friendly high-speed TCP variant,” Proceedings. PFLDNet'05, February 2005), BIC (L. Xu, K. Harfoush and I. Rhee, “Binary Increase Congestion Control (BIC) for Fast Long-Distance Networks,” In Proceedings of IEEE INFOCOM 2004, March 2004), FAST (C. Jin, D. X. Wei and S. H. Low, “FAST TCP: Motivation, Architecture, Algorithms, Performance,” In Proceedings of IEEE INFOCOM 2004, March 2004), H-TCP (R. Shorten and D. Leith, “H-TCP: TCP for High-Speed and Long-Distance Networks,” Second International Workshop on Protocols for Fast Long-Distance Networks, Feb. 16-17, 2004, Argonne, Ill.) to further improve TCP's throughput performance.
These solutions addressed the limitation of CWnd growth and thus are complementary to the present invention.
Receiver-Based Approaches
At the receiving end, Fisk and Feng proposed dynamic right-sizing of the AWnd by estimating the CWnd at the receiver and then dynamically adapt the receiver buffer size, i.e., the AWnd, to twice the size of the estimated CWnd. (M Fisk and W-C Feng, “Dynamic Right-Sizing in TCP,” Proceedings of the Los Alamos Computer Science Institute Symposium, October 2001) This ensures that when the sender's CWnd doubles (e.g., after receiving an ACK) the AWnd will not become the bottleneck.
More recent operating systems such as Linux 2.4 and Microsoft Windows Vista also implemented receiver buffer size auto-tuning by estimating the BDP from the data consumption rate. (See J. Davies, “The Cable Guy: TCP Receive Window Auto-Tuning,” TechNet Magazine, January 2007.)
What is needed is a mechanism to enable TCP to fully utilize the underlying network bandwidth without the need for modifying network applications running on both ends of the communication session nor requiring Large Window Scale (LWS) support from the operating system.