1. Field of the Invention
The present invention relates to internetworking systems and in particular to methods and apparatus for managing traffic flow in routers and switches.
2. Description of the Related Art
Internetworking encompasses all facets of communications between and among computer networks. Such communications data flow streams may include voice, video, still images, and data traffic. All have widely varying needs in terms of propagation delay (or latency) during transit through the network. Various systems and devices, both in hardware and in software, have attempted to deal with the plethora of data flow requirements present in modern internetworking systems.
One such scheme consists of attempting to regulate the traffic within the router or switch connecting multiple networks in the typical internetworking system. Such schemes attempt to provide fair allocation of data throughput capacity (bandwidth) by allocating router buffer and/or queue space according to the type of packets in each flow stream received.
A particular problem in internetworking traffic regulation arises from the variety of traffic sources or flows presented to the router/switching device. Referring to FIG. 1, illustrating a high-level schematic view of the operation of a prior art router/switch 10, a number of input flows 20 are presented to the unit. These flows each consist of multiple packets of data in a variety of sizes and presented at a variety of rates. Additionally, flows may be presented in different protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) and the related User Datagram Protocol (UDP), File Transfer Protocol (FTP), Terminal Emulation Protocol (Telnet), and Hypertext Transfer Protocol (HTTP). Other internetworking protocols are found in the literature, such as Merilee Ford, et. al., Internetworking Technologies Handbook, Cisco Press 1997 (hereinafter Ford), incorporated herein by reference in its entirety. The packets are buffered in a buffer pool 30, which is typically random access memory (RAM). Buffering is accomplished according to the directives of a controller 60 and a buffer manager 25. The flows are sent to the proper output port 70 by way of a set of output queues 40 and a port scheduler 50, discussed below. Controller 60 is conventionally implemented as one or more high speed microprocessors with associated interface circuitry. Buffer manager 25 and port scheduler 50 can be implemented as part of a switch ASIC.
Some flows are well-behaved in the event of traffic congestion: when faced with packet drops (i.e., packets discarded deliberately by a downstream device due to congestion at that device), these “good” (robust) flows reduce their flow rates and send less packets per unit of time. Other flows, however, are not well-behaved. These non-adaptive “aggressive” flows (NAFs) do not throttle back the flow of packets to the router when they experience drops. This may be because the NAFs do not recognize the congestion (sometimes due to protocol incompatibilities) or more likely because they actually are trying to capture more bandwidth. The latter situation arises particularly in flows sent by sources that consider themselves higher priority than all others (hence the term “aggressive”); such priority assumptions by one flow are often in error in the modern, highly heterogeneous networks seen today.
Several regulation schemes are known in the art. Broadly classified, these schemes fall into two types: queue-based and buffer-based.
In queue-based schemes, incoming flows are classified according to their actual priority, as determined by the receiving router, and assigned accordingly to output queues within the router. High priority flows, such as time-sensitive voice traffic, are placed in a queue that is read out more often. Low priority flows, such as file transfer protocol (FTP) or hypertext transfer protocol (HTTP) flows, are placed in queues that are read out of the router at a slower rate. Numerous schemes, discussed below, are used to control the buffering and enqueuing methods to achieve a measure of throughput balance or fairness among flows, thus managing router/switch bandwidth as efficiently as possible. As will be seen, however, all of these schemes have drawbacks in cost, capacity, and efficiency that suggest a better scheme is needed.
In the extreme, queue-based flow management assigns one queue per input flow. Queues are read out of the router according to statistically fair scheduling 10 process, such as round-robin, employing port scheduler 50. In round-robin scheduling, one packet is read out of each queue, one queue at a time, reading again from the first queue only when one packet has been read out from every other queue. This system is known as fair queuing (FQ). While FQ and its variants (e.g., weighted fair queuing, stochastic fair queuing) operate well when the number and variety of input flows is 15 small and well-behaved, they become inefficient when the number of flows grows. Clearly, a high number of flows requires a large number of queues, consuming a proportionally larger amount of resources, both in hardware and in operational complexity. More memory and more software processing overhead is required to set up and tear down the queues as flows begin and end. In the context of the modern, 20 high volume networks seen today, this extra cost and complexity is undesirably inefficient.
A less extreme variation of FQ is random early drop (RED) and variants thereon. In a RED scheme, a smaller number of queues (less than the total number of input flows present at any time) is maintained. Flows are segregated into queues by flow volume, with a number of high volume flows placed in one queue. Each queue is managed according to a probabilistic flow rule that causes packets to be dropped more often in the queues associated with the heaviest flows. Because of this relationship, heavy flows experience packet drops more often, statistically, than other flows. This scheme achieves a measure of fairness, but it assumes that heavy flows will be well-behaved, i.e., that they will reduce flow rate when they experience packet drops. This assumption has proven to be erroneous in the modern heterogeneous network. Certain NAFs do not reduce flow rate and thus continue to take an unfair amount of router bandwidth simply because they counter packet drops with retransmissions. The “good” flows get less and less throughput as they reduce flow rate in response to drops while the NAFs capture more bandwidth.
As a further drawback, the random packet drops sometimes hit fragile flows. Such flows contain time-critical traffic of the highest priority, such as voice data. Fragile flows have the lowest tolerance for drops and delay, so a random packet drop management scheme can have a highly detrimental effect on them.
An alternative to managing router/switch traffic at the queue end is to manage flows at the buffer end, referring to buffer pool 30 of FIG. 1. The basic premise of buffer-based management is that if one limits how much of a particular input flow gets into buffers 30 relative to other input flows 20, the output queues 40 will take care of themselves. Such limits on the number of packets buffered per flow can be either static or dynamic.
In the static or strict limit scheme, a set maximum number of buffers is available for each flow. Any packets received after those buffers are full are discarded. Static limits are set by the system administrator for each type of flow. However, this scheme has the obvious drawback of high overhead associated with setting up a gating mechanism for each flow and administrative oversight. Additionally, it lacks long-term flexibility to adapt to the wide variety and constantly changing mix of flow types seen in modern internetworking.
Typical prior art systems implement static buffer limitation schemes in software with limited hardware support. All experience the same or similar drawbacks noted above due to overhead (set up and tear down, as well as processing time delay and hardware resource) costs.
Dynamic buffer limiting, on the other hand, computes a buffer limit for each flow based on the size of the output queues, the size of the buffer pool, the number of buffers available at any given instant, and other router/switch state variables. Additionally, and most importantly, dynamic buffer limitation schemes compute limits based on the flow type. FIG. 2 illustrates the standard bit configuration for an Internet Protocol packet, including the fields within its header. Flow type, also known as flow classification, information can be found in, for instance, the type of service (TOS) field 210 in the internet protocol (IP) header 200 or in the source address 220 of the received packet. It can also be deduced from the type of packet received, voice being of a higher priority and thus demanding of a lower delay than other flows. While some forms of dynamic buffer limiting (DBL) are known, current schemes are unable to update their limit values fast enough to keep up with the latest generation of ultra-fast (e.g., Gigabit speed) flows. As an additional drawback, the use of TOS field 210 is not standardized among internetworking users. Thus, neither TOS nor source address is a reliable means of identifying flow type at this time.
Some of these shortcomings are addressed by the co-pending, commonly-assigned patent application entitled PER-FLOW DYNAMIC BUFFER MANAGEMENT, Ser. No. 09/238,552 (hereinafter “DBL Application”), naming Andreas V. Bechtolsheim and David R. Cheriton as inventors, the entirety of which is hereby incorporated by reference. However, large numbers of voice flows, in particular, pose additional problems not solved by the co-pending DBL Application.
A well-behaved voice flow, such as Voice-over-Internet Protocol (VoIP), makes very low demands on an internetworking switch, in terms of data rate and buffering, but needs low-latency queuing and forwarding to stay within an acceptable voice delay budget. Each voice channel needs only a small percentage of the available bandwidth; essentially no buffering would be required if all voice flows were well-behaved. However, one badly behaved flow (a voice NAF) can cause queuing, resulting in other voice flows being queued up behind it. This queuing quickly causes unacceptable delays.
The credit scheme disclosed in the DBL Application, discussed above, decrements a credit field for a given flow when packets are dropped and increments it when packets from that same flow are enqueued. That scheme is too crude, however, when flows have a nominally low data rate and are highly delay sensitive, as is the case for voice flows.
One alternative, simply giving a high priority to voice packets, runs the risk of starving other flows. Such starvation occurs when high priority packets are transmitted before all other flows, which can result in the low priority flows not being transmitted at all or with unacceptably long delays. Furthermore, such starvation can also occur when NAFs are only pretending to be voice flows, as by a deliberate mis-identification of TOS by the source. Experience has shown that malevolent users (e.g., hackers) will manipulate TOS in order to capture additional bandwidth and decrease latency in their own traffic at the expense of others.
Other existing schemes based on strict per-flow queuing are expensive to set up, manage, and tear down, as noted above. They also run at low utilization levels because voice flows are currently minority users of switch bandwidth and operate at very low data rates. Explicit bandwidth reservation schemes, such as Resource-Reservation Protocol (RSVP), described in Ford, require additional hardware for each flow to record and enforce reservations on the fast (low latency) path through the switch. Reservation systems also require long call set up times, placing further burdens and inefficiencies on the switch.
What is needed is a scheme to rapidly identify “good” flows from “bad” (i.e., the well-behaved flows vs. the non-adapting aggressive flows) on a packet-by-packet basis and to adaptively limit the throughput rate of the bad flows.