Field of the Invention
The present invention relates to the field of network communications and, more particularly, to a method and an apparatus for congestion control in network communications.
Congestion Control
Congestion situations arise in a data communications network whenever the available transmission capacity of a network node or link is lower than the data rate it needs to relay. For example, FIG. 1A shows a data transmitting node 110 sending a data flow 150 to a data receiving node 120 through a communications network 140, which includes a collection of interconnected network nodes 141, 142, 143, and possibly others. The data flow 150 is relayed in this case by network nodes 141 and 142. FIG. 1A further shows another data transmitting node 111 sending a data flow 151 to a data receiving node 121 through the same network 140. However, the data flow 151 is relayed in this case by network nodes 141 and 143. If the aggregated sent rate of data flows 150 and 151 is larger than the relaying capacity of network node 141, in which they coincide, then network node 141 will become congested and both data flows 150 and 151 will experience a congestion situation. If a network node (such as 141) has some input buffering capability, the congestion situation will cause its buffering capacity to be progressively used until it may eventually fill up. Once filled up, if the congestion situation persists, part of data flows 150 and 151 will be discarded and thus lost from the point of view of the data receiving nodes 120 and 121. Congestion control mechanisms are used to avoid, mitigate and handle congestion situations. Congestion control mechanisms can be part of the functionality of any layer, typically the link layer, the network layer, the transport layer or the application layer, and can reside in the end nodes (such as 110 or 111) or in the intermediate nodes (such as 141, 142 or 143).
As one of the most widely-used communication protocols nowadays, Transfer Control Protocol (TCP) uses sophisticated congestion control mechanisms. Many of the technical advances in the field of congestion control have taken place within the development of TCP. Thus, in the following, congestion control mechanisms in TCP will be described. However, many of the technical developments coming from TCP have later been adopted by other protocols, such as Stream Control Transmission Protocol (SCTP).
Standard TCP Protocol Congestion Control
Transport Control Protocol (TCP) is a widely used in data communication networks. The TCP, specified in J. Postel, “IETF RFC 793: Transmission control protocol,” 1981, and incorporated herein by reference, provides reliable data transmission between two endpoints. Endpoints are commonly referred to as “hosts” in literature concerning TCP technology. The term “reliable data transmission” refers to the fact that the TCP provides an adaptive repeat request (ARQ) mechanism enabling an acknowledged data transmission. In particular, as illustrated in FIG. 1B, a data transmitting node 110 transmits a data segment 101 (payload of TCP datagram) through a network 140, of which a data receiving node 120 checks the correct arrival. The data receiving node 120 then sends back to the data transmitting node 110, through a network 140, an acknowledgement 102 positively acknowledging the correct reception of the data. In accordance with this feedback 102, or the lack thereof, the data transmitting node 110 may retransmit the data. The acknowledgements are also transmitted in TCP segments and may be cumulative, i.e. the acknowledgement of a TCP segment implies the acknowledgement of all prior consecutive TCP segments.
Data packets may get lost, which means that they do not arrive within a predetermined time window (i.e. time period) at the data receiving node. Moreover, data packets may experience some transmission errors, which may be detected at the data receiving node by standard means including error detection/correction codes such as cyclic redundancy check (CRC) or others. The delay or transmission errors may be caused by increased load within the network and/or by worsening of the channel conditions. The acknowledgement mechanism provided by the TCP enables recovering the packet loss and corrupted data by means of retransmissions.
However, if the network experiences high load, repeated retransmissions by many users may further worsen the situation and the network may become congested. In order to avoid such a situation and in order to handle congestions, TCP provides some congestion control mechanisms and strategies, which may be implemented on the data transmitting node 110 and/or data receiving node 120.
In the following, standard TCP terminology will be employed, as used in RFC 793 cited above and IETF RFC 5681 “TCP Congestion Control,” from September 2009, in particular:
Host: a network node that is an end-point of a TCP communication. The term “host” will also be employed for other protocols to which the present invention may be applied, to mean a network node that is a communication end-point from the point of view of those protocols.
Connection: a bidirectional data flow established between two hosts, uniquely identified, with its own establishment, flow control and congestion control mechanisms independent of other data flows.
Congestion Window (cwnd) denotes the maximum amount of consecutive data a TCP host can send beyond the latest acknowledged sequence number, as calculated locally by the sending host, not having taken into account the window advertised by the receiving host.
Receive Window (rwnd) is the window size advertised by the receiver to the sender in the acknowledgement messages it sends back. This window size specifies the maximum amount of consecutive data the receiver is ready to accept beyond the latest acknowledged sequence number. This is the mechanism used by TCP to implement flow control, i.e. to prevent a fast sender from overwhelming a slow receiver.
Send Window is the lowest of cwnd and rwnd. Standard TCP congestion control is based on mechanisms for the sender, and thus focuses on setting an appropriate cwnd value, assuming that cwnd determines the Send Window.
Slow Start is a TCP congestion control state. In the Slow Start state, the TCP congestion control algorithm increases the cwnd in an exponential fashion, where cwnd is increased by about one segment size every time an acknowledgement is received. A TCP sending host in a TCP Connection is said to be in Slow Start when it is using this manner of increasing the cwnd.
Congestion Avoidance is a TCP congestion control state. In the Congestion Avoidance state, the TCP congestion control algorithm increases the cwnd in a slower fashion than in Slow Start. In standard Reno/New Reno TCP, the cwnd grows by 1/cwnd bytes (the inverse of cwnd) for every byte acknowledged, i.e. cwnd will grow by the size of one segment after acknowledging segments that add up cwnd bytes in size. Different TCP variants have different congestion avoidance algorithms. A TCP sending host in a TCP Connection is said to be in Congestion Avoidance when it is using this manner of increasing the cwnd.
Slow Start Threshold (ssthresh) defines the transition between Slow Start and Congestion Avoidance control states: it is the Congestion Window value below which Slow Start is used and beyond which the Congestion Window is increased according with Congestion Avoidance,
Flight size is the amount of data, usually measured in bytes, transmitted by the sender and not yet acknowledged.
Buffer-bloat is a term for an excessive use of buffers in the transmission path by a connection using a larger congestion window than what would be required to overcome the intrinsic delay and bandwidth adaptation buffering in the transmission path.
The TCP terminology described above may also be used in other protocols which may use similar concepts as those used for TCP congestion control.
The TCP specifications, such as IETF RFC 5681 cited above; IETF RFC 6582 “The NewReno modification to TCP's fast recovery algorithm,” from 2012; IETF RFC 2018: TCP selective acknowledgment options” from October 1996; and IETF RFC 6675: “A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP” from 2012 (all incorporated herein by reference) include congestion control algorithms for determining the most appropriate data sending rate for the hosts and data recovery mechanisms to allow the efficient retransmission of data lost due to congestion or to other causes. As currently specified in the official Internet Engineering Task Force (IETF) RFC documents, the TCP congestion control approach is derived from the so-called “Reno” congestion control mechanism (described, for instance, in V. Jacobson, “Congestion avoidance and control,” in ACM SIGCOMM Computer Communication Review, 1988), with several additions to improve the recovery from packet transmission losses, and is based on the following principles:                Congestion control mechanisms are implemented in the TCP functionality of the communicating hosts, not relying on intermediate network-level and link-level nodes.        Cumulative acknowledgment segments are sent from the receiving endpoint when correct data segments are received, indicating the sequence number of the highest consecutive correct byte received.        The “Send Window” defined above, which, if not limited by the rwnd, is equal to cwnd, controls the amount of data transmitted by the sender.        At the beginning of the connection (and after retransmission time-outs), the sender starts with a minimal Congestion Window value and increments it with the “Slow Start” algorithm, as explained above.        If not limited by rwnd or the sender's output buffers, the Slow Start algorithm causes the sending rate to rapidly go up beyond the network capacity, resulting in packet losses.        Losses are detected by the sender when three duplicate acknowledgement segments are received. After a fast loss recovery procedure, the congestion window is set to half the maximum value attained during Slow Start and the connection is switched to Congestion Avoidance, which causes the Congestion Window to grow more slowly than in Slow Start.        In Congestion Avoidance, an Additive-Increase-Multiplicative-Decrease (AIMD) scheme is followed. The Congestion Window increases by a small fixed amount for every segment acknowledged, and is reduced by half after congestion is detected. Congestion is detected when packet loss occurs (three consecutive duplicate acknowledgements received). Even though this mechanism forces congestion to occur, it has been proved mathematically that several TCP flows sharing a bottleneck resource and following a similar AIMD scheme for Congestion Window growth will eventually share the available bandwidth in a fair manner.        Very heavy packet losses are not recoverable with the fast recovery procedure, which eventually causes the retransmission timer to be triggered in the TCP sender. This time-out will result in the first sent but not acknowledged segment to be retransmitted, after which the Slow Start phase is initiated, with the Congestion Window size reduced to one segment. This mechanism eventually makes all senders reduce their sending rate drastically in case of heavy congestion, which avoids a complete congestive collapse.        
This standard TCP (sometimes called Reno or NewReno), has three limitations: (1) low performance in high-speed/long-delay networks, because the small rate of growth of the Congestion Window in Congestion Avoidance takes a very long time to achieve the large sizes of Congestion Window required by those networks; (2) excessive use of network buffers (buffer-bloat), increasing the delay experienced by the communicating hosts, due to the loss-based mechanism to detect congestion, which increases the Congestion Window up to the point where the buffers in the network nodes in the path are full and therefore sent packets get dropped, and (3) competition against concurrent TCP “greedy” flows, i.e. TCP flows that raise their Congestion Window in a more aggressive way than this standard TCP when they are sharing a bandwidth bottleneck, which will take most of the available bandwidth for themselves and starve Reno TCP flows.
As mentioned above, performance in high-speed/long-delay networks is limited with standard TCP, because the linear Congestion Window growth in Congestion Avoidance is too slow in those cases, which results in significant unused capacity. Many TCP variants have proposed more aggressive Congestion Window growth schemes for those scenarios, like TCP variants called STOP, HSTCP, BIC-TCP, H-TCP, CUBIC and TCP-Hybla, retaining loss-based congestion detection. Such variants are in general successful at improving TCP performance in high-speed/long-delay networks, but they do not solve the buffer-bloat problems, because they use loss-based congestion detection. In some cases, they may also have problems sharing a bandwidth bottleneck with less aggressive TCP variants, like standard Reno TCP, which can be overwhelmed by their more aggressive increase in Congestion Window.
Delay-Based Congestion Detection to Reduce Buffer-Bloat
There are TCP variants that detect network congestion by analyzing the connection's end-to-end delay through the measured Round-Trip Time (RTT), the time from sending a segment to receiving an acknowledgement for it. Some of such variants are TCP Vegas, TCP Vegas-A, TCP New Vegas, and FAST-TCP. The RTT measurements are sometimes translated into sending rate estimates, or into estimates of segments queued (buffered) in the transmission path, but in reality the independent variable they use to make decisions are RTT measurements.
In delay-based TCP variants, an increase in RTT is taken as a signal of the onset of congestion and, in Congestion Avoidance, the decision to increase or decrease the Congestion Window is made based on those RTT measurements. These methods generally succeed at reducing or eliminating congestion losses, thereby reducing the buffer-bloat and the excessive delay. However, they suffer heavily when competing against concurrent loss-based TCP variants. The reason is that delay-based TCP flows detect congestion earlier than loss-based TCP flows and reduce their sending rate accordingly. Loss-based variants do not have that restraint and keep increasing their sending rate until they fill up all intermediate bottleneck buffers, resulting in less and less capacity for the delay-based flows. Since the overwhelming majority of TCP in the Internet today uses loss-based congestion control, this has been a major obstacle for the adoption of pure delay-based variants.
Mixed Loss-Based and Delay-Based Variants
A very early proposal to use a mixed model, known as TCP-DUAL, adds to a loss-based Reno congestion control a congestion detection algorithm based on RTT measurements, which triggers a multiplicative decrease in Congestion Window. This approach can solve the buffer-bloat problems, but because of the RTT-based delay detection, like the above mentioned pure delay-based variants, it cannot compete against loss-based TCP variants.
Some other TCP variants, like Compound TCP, TCP Libra, TCP Africa, TCP Veno, YeAH-TCP and TCP Illinois, also use mixed models with loss-based and delay-based congestion control. The delay-based congestion detection is used to modulate the aggressiveness of Congestion Window growth, allowing more aggressive growth when no congestion is detected, which usually solves the performance problems associated to Reno/NewReno in high-speed/low-delay networks. However, in all these variants, when an RTT-based metric estimates that there is congestion, the Congestion Window will still keep growing, albeit more slowly, until packet losses arise, so buffer-bloat problems will persist, even when not competing with other data flows.
Another TCP variant called TCP Vegas+ is a mixed model using TCP Vegas by default, but switching to NewReno if a competing flow is detected. This should avoid buffer-bloat problems when there is no competition from other flows, but several unsolved Vegas problems remain, for example, low performance in high-speed/long-delay networks.
TCP Variants with Bandwidth or Rate Estimation
A TCP variant called Tri-S is an early TCP variant with rate-based congestion detection, using the time evolution of the transmission rate. However, without adequate filtering of the transmission rate estimates or without a statistical approach to testing the growth or stability of measured rate, the relatively large inherent variability in measured RTT spoils the congestion detection results. Moreover, a real-time rate-based congestion detection is not able to distinguish between a pure congestion situation and a situation in which the TCP flow is competing against a similarly aggressive TCP flow: in both situations the window will grow and the measured rate will stay constant. Since Tri-S reduces the Congestion Window upon congestion detection, it cannot compete against a TCP flow with loss-based congestion control, just like delay-based TCP variants.
TCP variants denoted as TCP-Westwood and TCP-Westwood+introduced explicit bandwidth estimations into their congestion control mechanisms, based on a complex measurement of inter-acknowledgement timing or simply the rate of received acknowledgements. With appropriate filtering, those estimates at the time a congestion-induced loss happens are taken as the available bandwidth for the TCP connection. This bandwidth estimation, together with the minimum RTT measured, is then taken to determine the optimum Congestion Window. From there on, a Reno-like Congestion Avoidance mode will eventually take the Congestion Window size to congestion and packet losses (thus causing buffer-bloat), at which point a new optimum Congestion Window will be calculated based on a new bandwidth estimation.
There are also some more recent variants of TCP-Westwood aimed at high-speed/long-delay networks (such as LogWestwood+, TCPW-A, TCP-AR and TCP Fusion) with more aggressive window growth, which adapt better to changing network bandwidth, but which still suffer from buffer-bloat and may overwhelm concurrent, less-aggressive Reno flows.
Improved Transition from Slow-Start to Congestion Avoidance
The TCP variants described so far concentrate in the behaviour during the Congestion Avoidance phase. However, the transition from the Slow Start to Congestion Avoidance can be very important, especially in small downloads that spend a significant part of their existence in Slow Start. Detecting congestion in Slow Start only with losses may lead to severe buffer-bloat problems and packet losses, since congestion will be reached while the sending window is growing exponentially.
There are different approaches to detect congestion prior to packet losses in Slow Start and to then change into a less aggressive Congestion Avoidance algorithm. Some of them use measurements of inter-acknowledgement delays, which can be inaccurate because of the time-measurement precision and the sophisticated filtering required in the sender. TCP-Vegas proposes a modified Slow Start which in fact causes a premature Congestion Avoidance due to the burstiness of the Slow Start traffic. “Limited Slow start” is an experimental IETF RFC that relies on an arbitrary constant to determine the transition point. “Adaptive Start”, being a part of the TCPW-A variant, uses the estimated bandwidth to derive the Slow Start Threshold parameter, and thus strongly depending on the quality of that estimate. “Hybrid Start”, which is nowadays used by default in most versions of the wide-spread Linux operating system, uses two heuristic algorithms based on RTT measurements and inter-acknowledgement delays. This works well except when competing against concurrent TCP flows with congestion, because the concurrent traffic will increase the RTT from the beginning, and so the switch from Slow Start to Congestion Avoidance will occur too early, which will cause a decreased throughput.
Fairness and Competition Against More Aggressive TCP Flows Under Congestion
In much of the literature, the fairness problem is about how a more aggressive (with respect to congestion window growth) TCP variant avoids overwhelming a less aggressive one. However, it is equally important for a TCP flow to become more aggressive if it is determined that another TCP flow is competing with it in a more aggressive way. This can happen even if the other flow is from the same TCP variant, when that other flow is in the Slow Start. One of the few TCP variants addressing this issue is TCPW-A, which has a mechanism to increase the Slow Start threshold parameter if it is estimated that it would result in higher bandwidth, but it is conditioned by another mechanism to detect that there is no other TCP flow competing.
Network-Based Approaches
The TCP congestion control approaches mentioned so far rely on functionality implemented in end hosts, primarily on the sender side. However, some approaches rely on functionality in intermediate network nodes, like routers and switches, which could alert the endpoint about impending congestion (e.g. TCP ECN), or drop packets before congestion occurs (e.g. Random Early Detection queue management algorithms). A new active queue management algorithm, CoDel (K. Nichols, V. Jacobson, “Controlling queue delay”, Communications of the ACM, vol. 55, no. 7, pp. 42-50, 2012) has recently been proposed to deal with buffer-bloat problems specifically, which also relies on part of the functionality being deployed in intermediate routers or switches. The problem with all these approaches is that they are very difficult to deploy, because there is an immense installed base of routers and switches that would have to support them along the end to end path. With host-based solutions, on the other hand, it is enough if the two hosts support the functionality, and if it is just a sender-side or a receiver-side functionality, only one of the end hosts needs to implement it to benefit from it.
Summarizing, buffer-bloat remains an important problem in TCP communications, causing unnecessary delays and resource usage, because the most widely used TCP variants utilize loss-based congestion detection, which at the same time prevents the use of delay-based variants that could mitigate the buffer-bloat problem. Almost all TCP-variants that use bandwidth-based and rate-based algorithms in determining the Congestion Window still use loss-based congestion detection, so buffer-bloat problems remain. In the few approaches where rate-based congestion detection is used to reduce the congestion window, measurement filtering and robust statistical methods are missing, so that the variability of measurements is not correctly addressed, and congestion is not well detected. In fact, as long as loss-based TCP variants are used (and nowadays they are the most widely used), buffer-bloat will be inevitable for any TCP flow that has to compete with them. However, there are many situations in which a connection does not compete with other connections over a capacity bottleneck, where buffer-bloat elimination would be very beneficial.
Moreover, commonly-used attempts to reduce buffer-bloat in the transition from Slow-Start to Congestion Avoidance based on delay metrics may be causing low throughput in the face of competition from other flows. In summary, the lack of a good mechanism for the detection of congestion and of competition is causing buffer-bloat problems in some cases and performance problems in other cases, when TCP flows do not react appropriately against competing concurrent TCP flows.