A majority of networks in use today use discrete data packets which are transferred between a sender and receiver node via one or more intermediate nodes. A common problem in these data packet networks is that the sender node has little or no information on the available capacity in the data packet network, and thus cannot immediately determine the appropriate transmission rate at which it may send data packets. The appropriate transmission rate would be the maximum rate at which data packets can be sent without causing congestion in the network, which would otherwise cause some of the data packets to be dropped and can also cause data packets on other data flows (e.g. between other pairs of nodes which share one or more intermediate nodes along their respective transmission paths) to be dropped.
To address this problem, nodes in data packet networks use either a closed or open-loop congestion control algorithm. Closed loop algorithms rely on some congestion feedback being supplied to the sender node, allowing it to determine or estimate the appropriate rate at which to send future data packets. However, this congestion feedback can become useless in a very short amount of time, as other pairs of nodes in the network (sharing one or more intermediate nodes along their transmission paths) may start or stop data flows at any time. Accordingly, the congestion feedback can quickly become outdated and the closed loop algorithms do not accurately predict the appropriate rate to send data packets. This shortcoming becomes ever more serious as capacities of links in data packet networks increase, meaning that large increases or decreases in capacity and congestion can occur.
Open-loop congestion control algorithms are commonly used at the start of a new data flow when there is little or no congestion information from the network. One of the most common congestion control algorithms is the Transmission Control Protocol, TCP, ‘Slow-Start’ algorithm for Internet Protocol, IP, networks, which has an initial exponential growth phase followed by a congestion avoidance phase. When a new TCP Slow-Start flow begins, the sender's congestion window (a value representing an estimate of the congestion on the network) is set to an initial value and a first set of packets is sent to the receiver node. The receiver node sends back an acknowledgement to the sender node for each data packet it receives. During the initial exponential growth phase, the sender node increases its congestion window by one packet for every acknowledgment packet received. The congestion window, and thus the transmission rate, is therefore doubled every round trip time. Once the congestion window reaches the sender node's Slow-Start Threshold (‘ssthresh’), then the exponential growth phase ends and it starts the congestion avoidance phase in which the congestion window is only increased by one packet for every round-trip it receives an acknowledgement, regardless of how many acknowledgment packets are received. If at any point an acknowledgement packet (or its absence) indicates that a loss has occurred, which is likely due to congestion on the network, then the sender node responds by halving the congestion window in an attempt to reduce the amount of congestion caused by that particular data flow. However, the sender node receives this feedback (i.e. the acknowledgment packet indicating that a loss had occurred) one round trip time after its transmission rate exceeded the available capacity. By the time it receives this feedback it will already be sending data twice as fast as the available capacity. This is known as ‘overshoot’.
The exponential growth phase can cause issues with non-TCP traffic. Consider the case of a low-rate (e.g. 64 kB/s) constant bit-rate voice flow in progress over an otherwise empty 1 GB/s link. Further imagine a large TCP flow starts on the same link with an initial congestion window of ten 1500 B packets and a round trip time of 200 ms. The flow keeps doubling its congestion window every round trip until, after nearly eleven round trips, its window is 16,666 packets per round (1 Gb/s). In the next round it will double to 2 Gb/s before it gets the first feedback detecting drops that imply it exceeded the available capacity in the network a round trip earlier. About 50% of the packets in this next round (16,666 packets) will be dropped.
In this example, the TCP Slow-Start algorithm has taken eleven round-trip times (over two seconds) to find its correct operating rate. Furthermore, when TCP drops such a large number of packets, it can take a long time to recover, sometimes leading to a black-out of many more seconds. The voice flow is also likely to black-out for at least 200 ms and often much longer, due to at least 50% of the voice packets being dropped over this period.
There are thus two main issues with the overshoot problem. Firstly, it takes a long time for data flows to stabilise at an appropriate rate for the available network capacity and, secondly, a very large amount of damage occurs to any data flow having a transmission path sharing the now congested part of the network.
Further concepts of data packet networks will now be described.
A node typically has a receiver for receiving data packets, a transmitter for transmitting data packets, and a buffer for storing data packets. When the node receives a data packet at the receiver, it is temporarily stored in the buffer. If there are no other packets currently stored in the buffer (i.e. the new packet is not in a ‘queue’) then the packet is immediately forwarded to the transmitter. If there are other packets in the buffer such that the new packet is in a queue, then it must wait its turn before being forwarded to the transmitter. A few concepts regarding the management and exploitation of node buffers will now be described.
A node implementing a very basic management technique for its buffer would simply store any arriving packet in its buffer until it reaches capacity. At this point, any data packet which is larger than the remaining capacity of the buffer will be discarded. This is known as drop-tail. However, this results in larger packets being dropped more often that smaller packets, which may be still be added to the end of the buffer queue. An improvement on this technique was a process known as Active Queue Management (AQM), in which data packets are dropped when it is detected that the queue of packets in the buffer is starting to grow above a threshold rate, but before the buffer is full. This gives the buffer sufficient capacity to absorb bursts of packets, even during long-running data flows.
Some nodes may treat each data packet in its buffer the same, such that data packets are transmitted in the same sequence in which they were received (known as “First In First Out”). However, node buffer management techniques introduced the concept of marking data packets with different classes of service. This technique can be used by defining certain classes as higher than others, and a network node can then implement a forwarding function that prevents or mitigates the loss or delay of packets in a higher class at the expense of a packet in a lower class. Examples of techniques that manage packet buffers using differing classes of service include:                (Non-strict) Prioritisation: In this technique, higher class packets will be forwarded by a network node before a lower class packet, even if the lower class packet arrived at the node earlier. This is often implemented by assigning a lower weight to a lower class, and serving each class in proportion to its weight.        Strict Prioritisation: Similar to the non-strict prioritisation, although a lower class packet will never be forwarded whilst a higher class packet is present in the buffer.        Traffic Policer: A network node may enforce a traffic profile specifying, for example, limits on the average rate and the maximum size of bursts. Any data flow that does not meet the profile is marked accordingly and may be discarded if congestion reaches a certain level.        Preferential Discard: If a buffer is filled with a queue of data packets, then any lower class packets will be preferentially discarded before higher class packets.        Selective Packet Discard: A proportion of the buffer is reserved for higher class data packets. The lower class packets may only occupy a smaller proportion of the buffer (relative to the buffer of that node without selective packet discard), and packets will be discarded if that smaller buffer is full.        AQM: AQM, as mentioned above, drops packets when it is detected that the queue of packets in the buffer is starting to grow above a threshold rate. This can be modified such that the packets dropped by AQM are those of a lower class of service.        
The approaches of Strict Prioritisation and Preferential Discard were both proposed to ensure lower class packets cannot cause harm to higher class packets. However, there are still problems with these techniques. In Strict Prioritisation, some network nodes may have one or more higher priority packets in the buffer for long periods (many seconds or even minutes), particularly during peak hours. This causes any lower class data packets to remain in the buffer for a long period of time. During this period, the sending/receiving nodes would probably time out and the data packet would be retransmitted in a higher class (on the assumption that the lower class packet was discarded). When the busy period in the higher priority buffer ends, the buffer of lower class data packets is finally transmitted. This merely wastes capacity as the data has already been received from the retransmitted higher-priority packet.
Network nodes can exploit the lower class data packets to determine the available capacity in the network (known as ‘probing’). In Preferential Discard, a burst of ‘discard eligible’ probing data packets may fill up a buffer, and only then is Preferential Discard triggered. During probing the discard eligible packets will cause a queue up to the discard threshold even if newly arriving probing traffic is discarded. Thus, probing will not be non-intrusive because higher class traffic from established flows will experience increased delay.
It is therefore desirable to alleviate some or all of the above problems.