1. Field of the Invention
The present invention relates to digital communications systems, in particular computer networking, and specifically data flow rate control.
2. Description of the Related Art
In the field of computer networking, one area of concern is maintaining and supplying a pre-negotiated quality of service (QoS) and/or a guaranteed packet rate. Further discussion of the general quality of service problem can be found in James F. Kurose and Keith W. Ross, Computer Networking: A Top Down Approach Featuring the Internet (Addison Wesley 2000), Chapter 6.6, incorporated herein by reference in its entirety.
Many systems attempt to provide a guaranteed bit rate or packet rate for designated flows through a switching or routing system. A “flow” is here defined as a unique data connection between a certain designated source address and a designated destination address. Generally speaking, a “flow” is a defined subset of the packet cell traffic between designated endpoints, not merely a transport connection.
Policers are a critical component in providing quality of service in data networks. Policers are used to hold a packet flow to a target rate in the presence of burst traffic. Token bucket and leaky bucket mechanisms are well known approaches to policing packet streams. See, for example, Kurose and Ross, cited above. In addition, there are “virtual time” based approaches to policing such as that described in the ATM Forum Traffic Management Specification, (version 4.0, af-tm-0056.000, June 1996) as the theoretical arrival time (TAT) algorithm. The ATM Forum Traffic Management Specification is incorporated herein by reference in its entirety. However all of these approaches have the same drawbacks seen in packet buffering, namely tail dropping. Tail dropping, as that term is understood in the art, refers to the complete drop of all packets in a transmission burst after the bursting flow exceeds its designated maximum flow rate.
The problem of tail dropping in packet buffers is described in S. Floyd, and V. Jacobson, Random Early Detection Gateways for Congestion Avoidance, IEEE/ACM Transaction on Networking, vol. 1, No. 4, August 1993, p. 397-413 and in V. Jacobson, K. Nichols, and K. Podhuri, RED in a Different Light, Technical Report, April 1999. Both of these papers are incorporated herein by reference in their entireties.
Generally speaking, bandwidth management on the links between routers and switches is the key element in maintaining quality of service. As noted in Kurose and Ross, there are three aspects of a flow's packet rate among which one could choose to implement a policing scheme. These three important policing criteria, which differ from each other according to the time scale over which the packet flow is policed, are as follows:                Average Rate. The network may wish to limit the long term average rate (i.e., packets per time interval) at which a flow's packets can be sent into the network. A crucial issue here is the interval of time over which the average rate will be policed. For example, a flow whose average rate is limited to 100 packets per second is more constrained than a flow that is limited to 6,000 packets per minute, even though both have the same average rate over a long enough interval of time. The latter constraint would allow a flow to send 1000 packets in a given second-long interval of time (subject to the constraint that the rate be less than 6,000 packets in a minute), while the former constraint would disallow this sending behavior entirely.        Peak Rate. While the average rate constraint limits the amount of traffic that can be sent into the network over a relatively long period of time, a peak rate constraint limits the maximum number of packets that can be sent over a shorter period of time. Using the example above, the network may police a flow at an average rate of 6,000 packets per minute, while limiting the flow's peak rate to 1,500 packets per second.        Burst Size. The network may also wish to limit the maximum number of packets (i.e., the burst packets) that can be sent into the network in an extremely short interval of time. As this interval length approaches zero, the burst size limits the number of packets that can be instantaneously sent into the network. While it is physically impossible to instantaneously send multiple packets (after all, every link has a physical transmission rate that cannot be exceeded), the abstraction of a maximum burst size is a useful one.        
One model that can be used to characterize different policing schemes is known as the “leaky bucket” mechanism (sometimes called the leaky bucket algorithm). A leaky bucket consists of a bucket (a logical container) that can hold up to b tokens.
In the leaky bucket mechanism, tokens are added to the bucket as follows: new tokens (which may potentially be added) are always generated at a rate of r tokens per second. If the bucket is filled with less than b tokens when a token is generated, the newly generated token is added to the bucket. Otherwise, the newly generated token is ignored and the token bucket remains full to its capacity of b tokens. The “leak” arises from the fact that tokens are removed from the bucket according to a defined rule representing the act by which the parameter policed (here, packet transmission).
The leaky bucket mechanism can be used to police a packet flow in the following manner: suppose that before a packet is transmitted into the network it must first remove a token from the token bucket. If the token bucket is empty, the packet must wait for a token. In this way, packets cannot enter the network until a token is available for them. This is analogous to requiring a ticket to enter a freeway.
Alternatively, rather than waiting for a token, a packet that arrives at an output queue looking for a token could be dropped if there are insufficient tokens to allow it to be enqueued. This is an example of a leaky bucket mechanism employed as an output queue control device.
The virtual time policing scheme, also well-known in the art, can also be used, as virtual time policers are generally considered an alternate to leaky bucket algorithms. In the virtual time scheme, the process first determines the “next time” that a flow is allowed to send a packet. When the next packet in that flow arrives, its time of arrival is compared to the “next time.” If the packet has arrived earlier than the “next time,” it needs to be policed or perhaps dropped. If the packet arrived later than the “next time,” it is allowed. A burst parameter is usually associated with each policer to indicate how much earlier than the “next time” a packet can arrive before it is policed.
The question now becomes, “How does the network behave in response to packet that is either dropped or held (i.e., buffered)?” Adaptive flows, such as TCP, typically respond to a lack of packet transmission, designated by the failure to receive a return acknowledgement from the receiving (destination) system, by reducing their transmit rate. In this way, an adaptive flow (often called a well-behaved flow) can slowly reduce its rate in response to unsuccessful transmissions.
In the presence of a packet transmission burst from a given flow, a leaky bucket mechanism will be able to pass at most b packets simply because the maximum size of the leaky bucket is b packets. Furthermore, because the token generation rate is r, the maximum number of packets that can enter the network in any interval of time length t is rt+b. Thus, the token generation rate r serves to limit the long term average rate at which packets can enter the network by causing the well-behaved, adaptive flows to lower their average, aggregated transmit (sending) rate to r.
One problem seen in the art and especially vexatious in situations requiring fine-grained, per-flow policing (also known as microflow policing) is that a TCP flow will ramp up to the policer rate and then experience a hard drop. In other words, in accordance with standard behavior of TCP flows, the sender will continue to increase its transmission rate until it fails to transmit a packet successfully. At this point, again according to the TCP standard, the packet drop (as indicated by the receipt of a double acknowledgment message at the sender) will cause the TCP sender to re-send the first unacknowledged packet and adjust its transmit rate downwards. If there is just one packet dropped, the flow will recover and continue at the reduced rate. However, if several packets have been dropped, the TCP connection will receive further duplicate acknowledgements. At that point, the sender will resort to a retransmission timeout.
A retransmission timeout, also by definition, causes the TCP sender to reset its transmission rate to the lowest supported rate on the link. The net result is that the TCP transmission rate will drop far below the policing rate on the occurrence of the first set of multiple packet drops and will remain at a sub-policing rate for a relatively long period of time. The situation is illustrated in FIG. 1A wherein the sawtooth behavior of the transmit rates results from the re-transmission timeout response to packet drops.
Some solutions for this problem, and the resulting loss in transmission efficiency, use two levels of policing, one of which only causes a mark or an upstream message that congestion is occurring. The second level, set at a slightly higher rate, causes a hard packet drop. The idea behind this approach is that the mark message will cause adaptive flows to reduce their rate by a small increment rather than starting all over at the minimum TCP rate and ramping up. In systems using this approach, a burst transmission momentarily supplying a rate in excess of the mark rate results in a slight decrease in transmitter rate, rather than a bard drop.
The disadvantage of this scheme is that it is difficult to implement in router and switch hardware. Such a dual-level or dual-rate policing scheme requires a great deal of additional memory and computational resources within the switch because the packet flow rate must be tested against two different rates, rather than one.
What is needed is a system that can provide fine-grained policing on a per-flow basis and is relatively immune to re-transmission timeout and concomitant loss transmission efficiency. Such a system must operate without consuming too much of the scarce processor and memory resources available in modern network devices.