1. Field of the Invention
The present invention relates to internetworking systems and in particular to methods and apparatus for managing traffic flow in routers and switches.
2. Description of the Related Art
Internetworking encompasses all facets of communications between and among computer networks. Such communications data flow streams may include voice, video, still images, and data traffic. All have widely varying needs in terms of propagation delay (or latency) during transit through the network. Various systems and devices, both in hardware and in software, have attempted to deal with the plethora of data flow requirements present in modem internetworking systems.
One such scheme consists of attempting to regulate the traffic within the router or switch connecting multiple networks in the typical intemetworking system at either the data link or network function levels. (The functions performed at each level are defined in the open systems interconnection (OSI) reference model. This model is well known in the art. See, e.g., Merilee Ford, et al., Internetworking Technologies Handbook, Cisco Press 1997.) Such schemes attempt to provide fair allocation of data throughput capacity (bandwidth) by allocating router buffer and/or queue space according to the type of packets in each flow stream received.
A particular problem in internetworking traffic regulation arises from the variety of traffic sources or flows presented to the router/switching device. Referring to FIG. 1, illustrating a high-level schematic view of the operation of a prior art router/switch 10, a number of input flows 20 are presented to the unit. These flows each consist of multiple packets of data, in a variety of sizes and presented at a variety of rates. Additionally, flows may be presented in different protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) and the related User Datagram Protocol (UDP), File Transfer Protocol (FTP), Terminal Emulation Protocol (Telnet), and Hypertext Transfer Protocol (HTTP). Other internetworking protocols are found in the literature, such as Merilee Ford, et. al., Internetworking Technologies Handbook, Cisco Press 1997, incorporated herein by reference in its entirety. The packets are buffered in a buffer pool 30, which is typically random access memory (RAM). Buffering is accomplished according to the directives of a controller 60 and a buffer manager 25. The flows are sent to the proper output port 70 by way of a set of output queues 40 and a port scheduler 50, discussed below. Controller 60, buffer manager 25, and port scheduler 50 are conventionally implemented as one or more high speed microprocessors with associated interface circuitry. Buffer manager 25 and port scheduler 50 are also implemented as ASICs.
Some flows are well-behaved in the event of traffic congestion: when faced with packet drops (i.e., packets discarded deliberately by a downstream device due to congestion at that device), these xe2x80x9cgoodxe2x80x9d (robust) flows reduce their flow rates and send less packets per unit of time. Other flows, however, are not well-behaved. These non-adaptive xe2x80x9caggressivexe2x80x9d flows (NAFs) do not throttle back the flow of packets to the router when they experience drops. This may be because the NAFs do not recognize the congestion, sometimes due to protocol incompatibilities, or (more likely) because they actually are trying to capture more router bandwidth. The latter situation arises particularly in flows sent by sources that consider themselves higher priority than all others (hence the term xe2x80x9caggressivexe2x80x9d); such priority assumptions by one flow are often in error in the modem, highly heterogeneous networks seen today.
Several regulation schemes are known in the art. Broadly classified, these schemes fall into two types: queue-based and buffer-based.
In queue-based schemes, incoming flows are classified according to their actual priority, as determined by the receiving router, and assigned accordingly to output queues within the router. High priority flows, such as time-sensitive voice traffic, are placed in a queue that is read out more often. Low priority flows, such as file transfer protocol (FTP) or hypertext transfer protocol (HTTP) flows, are placed in queues that are read out of the router at a slower rate. Numerous schemes, discussed below, are used to control the buffering and enqueuing methods to achieve a measure of throughput balance or fairness among flows, thus managing router/switch bandwidth as efficiently as possible. As will be seen, however, all of these schemes have drawbacks in cost, capacity, and efficiency that suggest a better scheme is needed.
In the extreme, queue-based flow management assigns one queue per input flow. Queues are read out of the router according to statistically fair scheduling process, such as round-robin employing port scheduler 50. In round-robin scheduling, one packet is read out of each queue, one queue at a time, reading again from the first queue only when one packet has been read out from every other queue. This system is known as fair queuing (FQ), or weighted fair queuing (WFQ). While FQ and its variants operate well when the number and variety of input flows is small and well-behaved, they becomes inefficient when the number of flows grows. Clearly, a high number of flows requires a large number of queues, consuming a proportionally larger amount of resources, both in hardware and in operational complexity. More memory and more software processing overhead is required to set up and tear down the queues as flows begin and end. In the context of the modem, high volume networks seen today, this extra cost and complexity is undesireably inefficient.
A less extreme queue-based technique is random early drop (RED) and variants thereon. In a RED scheme, a smaller number of queues (less than the total number of input flows present at any time) is maintained. Flows are segregated into queues by flow volume, with a number of high volume flows placed in one queue. Each queue is managed according to a probabilistic flow rule that causes packets to be dropped more often in the queues associated with the heaviest flows. Because of this relationship, heavy flows experience packet drops more often, statistically, than other flows. This scheme achieves a measure of fairness, but it assumes that heavy flows will be well-behaved, i.e., that they will reduce flow rate when they experience packet drops. This assumption has proven to be erroneous in the modern heterogeneous network. Certain NAFs do not reduce flow rate and thus continue to take an unfair amount of router bandwidth simply because they counter packet drops with retransmissions. The xe2x80x9cgoodxe2x80x9d flows get less and less throughput as they reduce flow rate in response to drops while the NAFs capture more bandwidth.
As a further drawback, the random packet drops sometimes hit a fragile flow. These flows contain time-critical traffic of the highest priority, such as voice data. Fragile flows have the lowest tolerance for drops and delay, so a random packet drop management scheme can have a highly detrimental effect on them.
An alternative to managing router/switch traffic at the queue end is to manage flows at the buffer end, referring to buffer pool 30 of FIG. 1. The basic premise of buffer-based management is that if one limits how much of a particular input flow gets into buffers 30 relative to other input flows 20, the output queues 40 will take care of themselves. Such limits on the number of packets buffered per flow can be either static or dynamic.
In the static or strict limit scheme, a set maximum number of buffers is available for each flow. Any packets received after those buffers are full are discarded. Static limits are set by the system administrator for each type of flow. However, this scheme has the obvious drawback of high overhead associated with setting up a gating mechanism for each flow and administrative oversight. Additionally, it lacks long-term flexibility to adapt to the wide variety and constantly changing mix of flow types seen in modern internetworking.
Typical prior art systems implement static buffer limitation schemes in software with limited hardware support. All experience the same or similar drawbacks noted above due to overhead (set up and tear down, as well as processing time delay and hardware resource) costs. Furthermore, typical prior art systems implement buffer limitation schemes based on a limit that is imposed per output queue or per class of service required by the received packet. FIG. 2 illustrates the standard bit configuration for an Internet Protocol (IP) packet, including the fields within its header. Class of service information, sometimes referred to as flow type or flow classification, can be found in, for instance, the precedence or type of service (TOS) field 210 in the IP received packet header 200 or in the source address 220 or a combination thereof. These systems also either set their limit values from manually configured parameters or else update them at a relatively slow periodic rate compared to packet rates.
Current schemes are unable to update their limit values fast enough to keep up with changing traffic conditions in the latest generation of ultra-fast (e.g., Gigabit speed) flows. As an additional drawback, the use of TOS field 210 is not standardized among internetworking users. Thus, neither TOS nor source address is a reliable means of identifying flow type at this time.
What is needed is a scheme to rapidly identify good flows from bad (i.e., the well-behaved flows vs. the non-adapting aggressive flows) on a packet-by-packet basis. Furthermore, a flexible, low-overhead, extremely fast dynamic buffer limiting method and apparatus to fairly buffer and enqueue the wide variety of good flows and NAFs found in today""s networks is also needed.
The present invention provides a per-flow dynamic buffer management scheme for a data communications device. With per-flow dynamic buffer limiting, the header information for each packet is mapped into an entry in a flow table, with a separate flow table provided for each output queue. Each flow table entry maintains a count of buffers currently in the queue for each flow. On each packet enqueuing action, a dynamic buffer limit is computed for the flow and compared against the number of buffers already used by the flow to make a mark, drop, or enqueue decision. A packet in a flow is dropped or marked if the buffer count is above this limit. Otherwise, the packet is enqueued and the count incremented by the number of cells in the newly-enqueued packet.
The scheme operates independently of packet data rate and flow behavior, providing packet-specific means for rapidly discriminating well-behaved flows from aggressive, non-adapting (badly behaved) flows in order to manage buffer allocation accordingly. Additionally, the present invention adapts to changing flow requirements by fairly sharing buffer resources. The present invention handles robust, well-behaved flows that adapt to congestion situations signaled by packet drop, fairly sharing bandwidth among these flows. The present invention also ensures good service for fragile flows (those sending few packets and those of a time critical nature) such as Voice-over-Internet Protocol (VoIP), thereby protecting them from non-adapting aggressive flows (NAFs).