In packet communications systems, it is common to reserve bandwidth for high priority packets which are then transmitted in preference to lower priority packets or to give preference to high priority packets such as one lower priority packet transmitted after each five high priority packets. Such lower priority traffic therefore must be managed to take advantage of the bandwidth remaining after the higher priority traffic has been served. This remaining bandwidth can vary widely depending on the activity of the high priority traffic and it is therefore of considerable importance to manage the low priority traffic so as to optimize the use of the widely varying additional bandwidth in the network, and, at the same time, avoid congestion in the network which reduces the network throughput.
In packet communication systems, senders are unaware of the level of traffic incident on an intermediate node. It is quite possible that the sum of traffic at some intermediate node may need resources beyond its capabilities. Consequently, the intermediate node will run out of buffers and lose packets, or become a bottleneck in the flow of traffic. It is therefore of considerable importance to manage the traffic so as to optimize the use of the widely varying bandwidth in the network and, at the same time, avoid congestion in the network which reduces the network throughput.
It has become common to utilize window-based flow control mechanisms to reduce congestion in the packet communications network. Such window-based mechanisms pre-allocate receiver buffer credits to packet sources and notify the corresponding sender as to how much data can be sent. Upon detection of congestion, either at an outgoing link (if the receiver is an intermediate node) or within a node, the receiver withholds buffer credits, forcing the sending partner to slow down the launching of packets or to stop transmission altogether. This process, also known as back pressure congestion control, is repeated at each hop of the transmission, eventually reaching the source of the congestion-causing traffic and forcing those sources to slow their transmission.
Such window-based back pressure mechanisms perform efficiently with low speed networks having relatively high bit error rates. However, as network transmission speeds increase, such as with fiber optic networks, the window-based mechanisms no longer perform adequately. The cost of such hop-by-hop method becomes prohibitively expensive and inefficient due to the fact that a sender can send an entire window's worth of data and be required to wait for the receipt of new buffer credits from the receiver before continuing. Furthermore, the window-based flow control does not smooth the transmission of data into the network and hence causes large oscillations in loading due to the clustering of packets, further degrading network performance. Using larger windows merely worsens the degradation.
To better accommodate modern high-speed communications networks, it has been proposed to use an end-to-end congestion control mechanism which relies on the regular transmission of sample packets having time stamps included therein. One such mechanism is disclosed in "Adaptive Admission Congestion Control", by Z. Haas, ACM SIGCOMM Computer Communications Review, Vol. 21(5), pages 58-76, October 1991. In the Haas article, successive time-stamped sample packets are used to calculate changes in network delays which are averaged to represent the state of the network. The averaged network delay is then used to control the admission of packets to the network by controlling the admission rate, either by controlling the inter-packet gap directly, or by adjusting the token rate in a standard leaky bucket scheme at the admission point.
One disadvantage of the Haas congestion control mechanism is that it sends sampling packets at regular intervals regardless of the traffic load from a sender. Sending such sampling packets when the sender is idle is wasted effort and reduces the effective throughput of the system. In addition, the Haas method requires that the system await the arrival of a plurality of sampling packets before initiating congestion control, thus hindering the response time for providing effective flow control as well as congestion control.
Another disadvantage of the Haas scheme is the accumulation effect that it possesses. If the length of queues along the congestion path is built up gradually by small amounts, the overall delay can exceed the threshold allowed for the overall connection without being detected by the endpoint detection scheme. The network can, therefore, become congested without timely correction when using the Haas congestion control scheme.
Still another disadvantage of the Hass method is the fact that the inter-packet control gap is used to control the input packet rate. Sources of short packets are therefore penalized disproportionately compared to sources of long packets when the inter-packet gap control technique of Haas is used to control congestion. Finally, and most important, the Haas congestion control scheme requires relatively frequent transmission of sampling packets to provide timely control information. The overhead from such sampling packets can reach up to twenty percent of the entire throughput of the network, making the Haas congestion control scheme provide lower throughput than an uncontrolled network when the traffic load is less than eight percent. If the transmission rate of Haas' sampling packets were to be reduced to approximate the round trip delay period, on the other hand, the scheme simply would not work at all due to the small quantity of control information available at the sender. That is, the averaging step used to reduce the noise in the control signal would make the scheme so unresponsive to the congestion to be controlled that the low sampling rate would be unable to correct the congestion.
As a result, the Adaptive Rate-Based Congestion and Flow Control method was developed as described in U.S. Pat. No. 5,367,523 entitled "Adaptive Rate-Based Congestion and Flow Control in Packet Communications Networks" assigned to IBM Corporation. The Adaptive Rate-Based Congestion and Flow Control method (ARB) is an end-to-end, closed loop flow and congestion control method for packet networks in which the flow and congestion control is accomplished by calculating, for every requested sampling interval, the lower of either the rate at which the receiver is accepting data from the network (congestion control) or the rate at which the receiver is able to deliver data to the end user (flow control). Thus, the network transfer rate is used to detect congestion while the end user acceptance rate is used to control the flow of data into the network so as not to exceed the user acceptance rate. The lower of these two rates is used to control entry into the network. Such a rate control mechanism is provided for each direction of transmission on each connection, and each end of a connection can request flow and congestion control information by piggy-backing a request for control information on the current data packet being sent to a receiver. The receiver calculates the network transfer rate by dividing the number of bits received since the last request by the length of the interval since the last request. Rate information is also piggy-backed on data packets for return to the sender. Requests are timed to reflect the data transmittal rate into the network, requests being more frequent when traffic is heavy and less frequent when traffic is light, thus conserving network resources when not needed for flow or congestion control.
More particularly, to implement the ARB flow and congestion control, three different operating modes are defined for the traffic flow control mechanism. The sender uses the traffic feedback information from the receiver to set the sender into one of the three operating modes which, for purposes of this description, will be called the `green`, `yellow`, and `red` operating modes. The sender is set to the green mode when the rate information received from the receiver indicates that the receiver rate is equal to or greater than the average data input rate at the sender less a sensitivity threshold used to compensate for the accumulated delay effect described above. The sender is set to the yellow operating mode when the received rate is less than the average data input rate less the sensitivity threshold and the sender is set to the red operating mode if a timeout occurs following the expected waiting period for a reply to the most recent request for rate information.
The green operating mode is used when the receiver is able to receive data at a higher rate than the currently permitted rate. In the green operating mode, the sending rate can therefore be increased by some incremental amount which does not exceed the maximum rate which will be tolerated by the network, where the maximum rate is the rate negotiated by the sender when the connection is set up. If the current operating mode is other than green, and the most recent response from the receiver calls for a shift to the green operating mode, the shift is delayed at least one request-response period. This requirement for confirmation of shifts to the green operating mode by at least two responses calling for such a shift prevents oscillation in operating mode due to transients in the data rates.
If the receiving rate returned in the response message is slower than the current sending rate by more than the same sensitivity threshold, the operating mode is set to yellow. In the yellow operating mode, the sending rate is reduced by some reduction factor, preferably not to exceed 10% of its current rate. In the yellow operating mode, it is assumed that the network is saturated and that it can carry no more traffic. The sender continues in the yellow operating mode, reducing the sending rate for each new response message indicating a lower receiving rate than the current sending rate (less the sensitivity threshold) until at least two responses are received dictating the need to shift to the green operating mode.
If a response timeout occurs while the sender is waiting for a response to its most recent request for rate information, the sender immediately assumes that congestion has delayed the response and the sending rate is therefore cut drastically, e.g., by half or to the latest received receiving rate, whichever is lower. The sender remains in the red operating mode until two or more request responses call for a return to the green operating mode.
In accordance with one feature of the ARB mode, the incremental increase in the sending rate during the green operating mode can be defined to be nonlinear to permit rapid increases in sending rate until saturation is approached and, thereafter, the incremental increases can be kept small to avoid wide fluctuations around a desirable operating point. Conversely, incremental decreases in the sending rate during the yellow operating mode can likewise be defined non-linearly to permit small decreases for small departures from an optimal operating point and larger decreases as the departure from the optimal operating point itself increases in magnitude.
The ARB flow and congestion control mechanism is highly adaptive to network conditions in that it is responsive to the offered traffic load and to the congestion. This results in maximizing throughput of the network while at the same time minimizing network congestion. The flow control mechanism also smooths the input traffic to discourage very large data bursts and the large queues which result from large data bursts. The ARB method also provides equal and fair access to all connections regardless of their relative average packet length. It can be modified to provide protection against congestion arising from the accumulation effect. In particular, by keeping track of the delay accumulations in a connection, the reported receiving rate can be adjusted to reduce the accumulated delays.
In the above described implementation of ARB flow and congestion control a few shortcomings have been identified. The first shortcoming is that, when a link along a connection path fails, HPR (High Performance Routing) recovers by determining an alternate route and switching to it. However, typically, when a link is lost, one or more packets in flight are lost, until the breakage is discovered. The packet losses are discovered in exchanges that take place after the path switch is completed. The packet losses have the effect of reducing the send rate of the reestablished connection exponentially (one quarter of the current rate for each packet lost). Furthermore, when a path switch is successfully completed, the rate at which data is allowed to flow is low, until the ARB flow and congestion control mechanism can determine the level of congestion on the new path. The rate reductions caused by packet loss aggravate the condition of slowness in the network, especially in a high-speed network.
An additional shortcoming is that, in some situations, network adapter hardware will discard received frames which causes problems in the ARB mechanism for determining the appropriate flow rate. In fact, on some adapters, when the arrival rate of frames exceeds some threshold, frames are summarily discarded. When frames arrive at a rate below this threshold, the adapter processes them expediently. While the ARB flow control mechanism will initially react appropriately for this condition--that is, adjacent nodes will cut the rate at which frames are sent to this adapter--its long-term behavior is suboptimal due to its inability to correctly compensate for dropped frames.
Consider the following example: on a 16 Mbit/sec token ring, some adapter might drop frames when the data rate exceeds 5 Mbits/sec. Initially, the ARB flow control mechanism will allow a modest send rate, say 1 Mbit/sec, and the send rate will be increased until a problem occurs. In our example, no problem will occur until the ARB mechanism allows a send rate of 5 Mbits/sec. At that rate, the suspect adapter will begin discarding frames until ARB slows the rate at which adjacent nodes send data to this node. Once the ARB flow control mechanism adjusts, the congestion disappears, and the ARB mechanism allows the rate to be increased since there is no indication of congestion. In fact, the ARB mechanism will increase the rate until the problem once again arises. Thus, a cycle will develop of dropped frames, rate cuts, congestion elimination, rate increases, dropped frames. In short, the flow rates will oscillate since the ARB flow control mechanism will never correctly adapt to the loss of frames.
A third shortcoming is that the standard ARB flow control mechanism regulates the send rate using an allowed.sub.-- send.sub.-- rate variable. If necessary, transmissions are delayed to keep within the allowed.sub.-- send.sub.-- rate parameters. The ARB status messages are exchanged between the computers at the endpoints of a connection to detect network congestion. If no congestion is detected, the allowed.sub.-- send.sub.-- rate is incremented in small steps up to a maximum of twice the actual send rate. This way the send rate is ramped up gradually so the congestion can be detected before it becomes severe. The problem is that twice the actual send rate may be too high or too low a number depending on the type of application. Since the measurement interval for computing actual send rate is relatively large, it includes periods during which the application may not be sending. The actual send rate therefore does not represent the send rate during transmission (call this transmission rate) but an average send rate.
If the application sends one half of the time, then the actual send rate equals one half of the transmission rate and allowed.sub.-- send.sub.-- rate equals the transmission rate so the formula works well. If the application sends less than one half the time, actual.sub.-- send.sub.-- rate is less than one half of the transmission rate which can never be more than the allowed.sub.-- send.sub.-- rate. Therefore the actual.sub.-- send.sub.-- rate is less than half of the allowed.sub.-- send.sub.-- rate and the allowed.sub.-- send.sub.-- rate will never be incremented beyond its initial setting. Under these circumstances, the application will be slowed by the ARB mechanism even when there is no network congestion. If the application sends near 100% of the time, the allowed.sub.-- send.sub.-- rate will be twice the transmission rate. This could cause flooding of the network if the transmission rate suddenly increases, due to more CPU availability. The goal is to control send rate so as to prevent sudden increases that might cause network congestion without slowing performance when there is no network congestion.
Yet a fourth short-coming is that, in the ARB algorithm, there is a mechanism for metering out data at a specified rate by permitting only b bytes of data out in a time interval t. Thus, if a connection uses up all the allocated bytes in less than t units of time, it will have to wait until those time units elapse before sending more data out. This interval of time is known as the burst interval, and the number of bytes allowed in a burst interval is called the burst size.
The design and simulation of the traditional ARB methodology assumed a very accurate clock and tight CPU scheduling were available. This meant that at the exact moment when a connection was allowed to send data, it would be scheduled to run. This is not true in a general-purpose multi-tasking operating system, like OS/2, where other processes may be scheduled ahead of the ARB process, or when interrupts occur that throw the timing off. As a result, it is possible that when the Rapid Transport Protocol (RTP) component (which implements ARB algorithms) runs, it may decide that the connection has already used up its burst size, and must wait for that burst interval to elapse before being allowed to send again. Due to the vagueness or inaccuracies of scheduling, it is possible that this occurs just a few milliseconds later. However, because the RTP component vies for CPU time along with other processes, the scheduler may not reassign the CPU to the RTP process for some time. Thus, that particular connection's sending status will not be updated, and even if the application using the connection has data ready to send, the data will be blocked until the burst size is reset, and the new burst interval is allocated. This can reduce the effective throughput of the system.
A fifth area of concern with the ARB flow and congestion control mechanism is that, in reliable communications protocols such as RTP, verification that data was received is accomplished when the sender transmits an acknowledgement request and the receiver transmits an acknowledgement reply. In RTP these are called status requests and status replies. The status request and status reply can accompany a data packet or they can be transmitted separately. The goal is to minimize the number of status requests and replies transmitted separately because these increase network traffic and CPU usage, thereby reducing the network capacity and application performance.
As is shown in FIG. 1, when multiple data packets are transmitted consecutively from computer "A" to computer "B" status responses can be reduced if "A" sends only one status request with the last frame of the sequence.
The problem is that the RTP implementation does not know when the sequence ends. This is determined by the application. The application passes data to RTP and it transmits it. The application may pass more data to RTP immediately or may wait for an undetermined amount of time before passing any more data. In transaction type applications, for example, application "A" will send data and then wait until application "B" sends data back. Since the RTP implementation cannot determine if there is an additional data packet in the sequence, it must request status with every packet sent, therefore creating further congestion in the network.