Network congestion arises when traffic sent or injected into a communications network (i.e., the number of injected packets or bytes per unit of time) exceeds the capacity of the network. Congestion causes the throughput of useful data traffic (i.e., traffic that reaches its destination) to be reduced because when the network is congested, packets hold onto network resources for longer times and/or network resources are consumed by packets that are later discarded. Network congestion is typically controlled by mechanisms for detecting congestion and by adjusting the amount of data traffic injected at the end nodes.
Congestion detection processes can be implemented at endpoints or at internal components of the network, such as switches or routers. As described in V. Jacobson “Congestion avoidance and control” ACM SIGCOMM 88, pp. 314-329, August 1988 (“Jacobson”), flow sources using the Transport Control Protocol (TCP) rely on endpoint detection of network packet dropping as an implied signal of congestion. An alternative approach is to detect congestion at network switches or routers by, for example, observing if the switch buffer occupancy exceeds a desired operating point. However, this approach requires an Explicit Congestion Notification (ECN) mechanism that notifies endpoints of the state of network congestion so that their data traffic injection properties can be adjusted accordingly.
In many ECN implementations (e.g., DEC's implementation described in K. K. Ramakrishnan, R. Jain “A Binary Feedback Scheme for Congestion Avoidance in Computer Networks” ACM Transactions on Computer Systems Vol. 8 No. 2, pp. 158-181, 1990; and Random Early Detection (RED) as described in S. Floyd, V. Jacobson “Random Early Detection Gateways for Congestion Avoidance” IEEE/ACM Transactions on Networking Vol. 1 No. 4, pp. 397-413, August 1993), switches mark ECN bits in packet headers to notify the destination nodes of congestion, thus avoiding the use of special control packets dedicated to carrying congestion information. The destination node, in turn, piggybacks the congestion marker on acknowledgment (ACK) packets, which are used in most transport protocols, such as TCP, to acknowledge the receipt of data packets by the destination node.
Typically, when congestion is detected, a source node adjusts the injection properties by decreasing the packet injection rate, and conversely, slowly increasing the packet injection rate when there is no congestion. Congestion response mechanisms generally control data traffic injection on the network in one of two ways. One way is to limit the number of packets that can be concurrently in transit in the network between a pair of communicating source and destination nodes. For example, as described in Jacobson, congestion control in TCP is achieved by using a window-based congestion control technique which dynamically adjusts the window limit. Source nodes implementing the window control technique typically uses ACK packets, which are part of the network transport protocols, to determine and control the number of packets that are in transit or “flight” to the destination node via the network. By blocking packet transmission whenever the number of unacknowledged packets reaches a threshold, the source node can bound the number of packets that can be concurrently in flight in the network, effectively controlling the rate of packet injection.
An alternative to window control is the rate control technique. Rate control involves controlling the rate at which the source node injects packets into the network, or equivalently, the time interval between packets injected into the network. This is further described in ATM Forum Technical Committee “Traffic Management Specification Version 4.0”, (http://www.atmforum.com/pages/aboutatmtech/approved.html), af-tm-0056.000, April 1996.
Window-based congestion control mechanisms offer self-clocked packet injection and the advantages are further discussed in Jacobson. Generally, the window limits the amount of buffering that a flow, ie a data stream, can consume, thus preventing the further injection of packets into the network when acknowledgments for transmitted packets stored in the buffer stop arriving. By limiting the amount of network resources (e.g., network buffers) used by multiple contending flows, a window-based mechanism can effectively control congestion. However, when the number of contending flows is large or when the size of switch buffers is small, the average network buffer utilization for each flow may have to be set at values lower than the size of one packet in order to avoid congestion. This is not possible to achieve by a pure window control mechanism since the minimum window size is the size of one data packet.
On the other hand, a rate-based response mechanism allows flows to have an average buffer utilization of less than a single data packet and is more suitable for situations where the number of flows is relatively large when compared with the number of buffer slots on the network switches. This is a likely scenario in high-speed system area networks, for example as described in Infiniband Trade Association “Infiniband Architecture Specification Release 1.0.a” (http://infinibandta.org), in which switch buffers can only hold a few data packets per port. However, a pure rate-based mechanism does not have self-clocking properties and may therefore continue injecting data packets into the network even when the network is congested and ACK packets, or congestion notification messages, stop arriving due to long network delays.
It is desired to provide a system and process for controlling network congestion that alleviates one or more of the above difficulties, or at least provides a useful alternative to existing network congestion control systems and processes.