The present invention relates to congestion management in computer networks in general and, in particular, to flow control in response to congestion.
A switch is a network node that directs datagrams on the basis of Medium Access Control (MAC) addresses, that is, Layer 2 in the OSI model well known to those skilled in the art [see “The Basics Book of OSI and Network Management” by Motorola Codex from Addison-Wesley Publishing Company, Inc., 1993]. A switch can also be thought of as a multiport bridge, a bridge being a device that connects two LAN segments together and forwards packets on the basis of Layer 2 data. A router is a network node that directs datagrams on the basis of finding the longest prefix in a routing table of prefixes that matches the Internet Protocol (IP) destination addresses of a datagram, all within Layer 3 in the OSI model. A Network Interface Card (NIC) is a device that interfaces a network such as the Internet with an edge resource such as a server, cluster of servers, or server farm. A NIC might classify traffic in both directions for the purpose of fulfilling Service Level Agreements (SLAs) regarding Quality of Service (QoS). A NIC may also switch or route traffic in response to classification results and current congestion conditions. The present invention applies to a network node that can be a switch, a router, NIC, or, more generally, a machine capable of both switching and routing functions based upon classification results and current congestion conditions.
Network processing in general entails examining packets and deciding what to do with them. This examination can be costly in terms of processing cycles, and traffic can arrive irregularly over time. Consequently network nodes (e.g., node 104 of FIG. 1) in general provide some amount of storage for packets awaiting processing (e.g., storage memory 109 of FIG. 1). During episodes of congestion, some arriving packets might be purposefully discarded to avoid uncontrolled overrunning of the storage. This is flow control.
All arriving traffic in a network processor can be stored in a Queue. Conventionally, the next step after this is to pass packets to Multifield Classification (MFC). If MFC is computationally complex for some packets, then the Queue can fill to the point that arriving packets are discarded, regardless of value. This discard action can be by virtue of Queue occupancy crossing a threshold.
A common prior art flow control is called Random Early Detection (RED). As queue length grows from 0 to full storage capacity, RED at first transmits all packets into the queue, then, if occupancy exceeds a threshold Lo >=0%, a decreasing fraction of packets into the queue, and finally, if occupancy exceeds a threshold Hi <=100%, completely discarding all arriving packets. For queue occupancy Q that is between Lo and Hi, the fraction T of packets transmitted can be a linear function of the following form:T(Q)=1−(1−Tmin)*(Q−Lo)/(Hi−Lo)
Here Tmin is a minimum transmitted fraction reached as Q increases to Hi. Many variations on this theme are practiced in the prior art; for example, Q might actually be an exponentially weighted moving average of queue occupancy. As another example, Lo=Hi and Tmin=0, the special case known as taildrop.
The use of RED or its variants unfortunately can imply some undesirable consequences including:
1. Methods ignore rate of change (queue going up, down)
2. High thresholds can cause high latency or lack of headroom for bursts
3. Low thresholds can cause burst-shaving (low utilization)
4. There is no direct relationship between thresholds and performance
5. Administrative input needed as offered loads change
6. Hand-tuning thresholds widely recognized as difficult
7. Little or no guidance in vendor documents.
A drawback with the prior art techniques is that the decision to transmit into a queue or discard an arriving packet is made in the device based upon heuristically determined thresholds or functions. In view of the above, more efficient apparatus and methods are required to make connection allocation decisions in high speed networks.