1. Field of the Invention
The present invention relates to data communication networks. More particularly, the present invention relates to an apparatus and method for enabling rate-based polling of input interface queues in networking devices.
2. The Background Art
As is known to those skilled in the art, a network is a communication system that allows users to access resources on other computers and exchange messages with other users. A network is typically a data communication system that links two or more computers and peripheral devices. It allows users to share resources on their own systems with other network users and to access information on centrally located systems or systems that are located at remote offices. It may provide connections to the Internet or the networks of other organizations. The network typically includes a cable that attaches to network interface cards (“NICs”) in each of the devices within the network. Users may interact with network-enabled software applications to make a network request (such as to get a file or print on a network printer). The application may also communicate with the network software, which may then interact with the network hardware to transmit information to other devices attached to the network.
FIG. 1 is a block diagram illustrating an exemplary network 100 connecting a user 110 and a particular web page 120. FIG. 1 is an example which may be consistent with any type of network known to those skilled in the art, including a Local Area Network (“LAN”), a Wide Area Network (“WAN”), or a combination of networks, such as the Internet.
When a user 110 connects to a particular destination, such as a requested web page 120, the connection from the user 110 to the web page 120 is typically routed through several internetworking devices such as routers 130-A–130-I. Routers are typically used to connect similar and heterogeneous network segments into internetworks. For example, two LANs may be connected across a dial-up, integrated services digital network (“ISDN”), or across a leased line via routers. Routers may also be found throughout internetwork known as the Internet. End users may connect to a local Internet service provider (“ISP”) (not shown).
As shown in FIG. 1, multiple routes are possible to transmit information between user 110 and web page 120. Networks are designed such that routers attempt to select the best route between computers such as the computer where user 110 is located and the computer where web page 120 is stored. For example, based on a number of factors known to those skilled in the art, the route defined by following routers 130-A, 130-B, 130-C, and 130-D may be selected. However, the use of different routing algorithms may result in the selection of the route defined by routers 130-A, 130-E, 130-F, and 130-G, or possibly even the route defined by routers 130-A, 130-B, 130-H, 130-I, 130-F, and 130-G. A detailed discussion of routing algorithms is not necessary for the purposes of the present invention, and such a discussion is not provided here so as not to overcomplicate the present disclosure.
FIG. 2 is a block diagram of a sample router 130 suitable for implementing an embodiment of the present invention. The router 130 is shown to include a master control processing unit (“CPU”) 210, low and medium speed interfaces 220, and high speed interfaces 230. The CPU 210 may be responsible for performing such router tasks as routing table computations and network management. It may include one or more microprocessor integrated circuits selected from complex instruction set computer (“CISC”) integrated circuits, reduced instruction set computer (“RISC”) integrated circuits, or other commercially available processor integrated circuits. Non-volatile RAM and/or ROM may also form a part of CPU 210. Those of ordinary skill in the art will recognize that there are many alternative ways in which such memory can be coupled to the system.
The interfaces 220 and 230 are typically provided as interface cards. Generally, they control the transmission and reception of data packets over the network, and sometimes support other peripherals used with router 130. Examples of interfaces that may be included in the low and medium speed interfaces 220 are a multiport communications interface 222, a serial communications interface 224, and a token ring interface 226. Examples of interfaces that may be included in the high speed interfaces 230 include a fiber distributed data interface (“FDDI”) 232 and a multiport Ethernet interface 234. Each of these interfaces (low/medium and high speed) may include (1) a plurality of ports appropriate for communication with the appropriate media, and (2) an independent processor, and in some instances (3) volatile RAM. The independent processors may control such communication intensive tasks as packet switching and filtering, and media control and management. By providing separate processors for the communication intensive tasks, this architecture permits the master CPU 210 to efficiently perform routing computations, network diagnostics, security functions, and other similar functions.
The low and medium speed interfaces are shown to be coupled to the master CPU 210 through a data, control, and address bus 240. High speed interfaces 230 are shown to be connected to the bus 240 through a fast data, control, and address bus 250 which is in turn connected to a bus controller 260. The bus controller functions are typically provided by an independent processor.
Although the system shown in FIG. 2 is an example of a router suitable for implementing an embodiment of the present invention, it is by no means the only router architecture on which the present invention can be implemented. For example, an architecture having a single processor that handles communications as well as routing computations would also be acceptable. Further, other types of interfaces and media known to those skilled in the art could also be used with the router.
FIG. 3 is a block diagram illustrating a model of a typical router system. As shown in FIG. 3, in the context of the present invention, a networking device such as a router 130 may be modeled as a device having a plurality of input interfaces 310a–310n, each having a corresponding input interface queue 320a–320n. Each input interface 310 receives a stream 330a–330n of data packets 340a–340z, with each data packet 340 typically arriving at a variable rate and typically having a variable length (usually measured in bytes). It should be noted that the average data packet arrival rate on each interface 310a–310n is typically variable over time, and that the short-term and long-term average data packet arrival rate typically varies across the interfaces 310a–310n as well.
As each new data packet 340 arrives on an interface 310k, it is written into a corresponding input interface queue 320k, waiting for its turn to be processed. Scheduling logic 350 determines the order in which input interfaces 310a–310n should be “polled” to find out how many data packets (or equivalently, how many bytes of data) have arrived on a given interface 310k since the last time that interface 310k was polled. Scheduling logic 350 also determines the amount of data that should be processed from a given interface 310k during each “polling round.”
In a typical router, scheduling logic 350 may operate in a “round robin” fashion in a continuous cycle of “polling rounds,” using a process which can be described as follows. Upon the arrival of a new packet 340i on a particular interface 310k, a device driver sends an interrupt request to the router's CPU, discussed earlier. If the CPU is idle, it will immediately start to process the new packet. If the CPU is busy with a low priority process, the Operating System (“OS”) performs a context switch to swap out the low priority process and starts to process the packet. Otherwise, a receiving (“RX”) interrupt may be set for input interface 310k while waiting to be handled. Later, when the CPU can service this interrupt, it polls all of the input interface queues 320a–320n in a static and predetermined sequence (e.g., in the order shown in FIG. 3).
During a typical polling process, for each input interface queue 320k having one or more packets stored in the queue at the time that input interface queue 320k is polled (or equivalently, for each input interface queue 320k having its RX interrupt set), all complete packets currently stored in the queue are read out of the queue and transferred to other storage locations in the router for further processing. In this typical example, the next input interface queue in the sequence is not polled until all pending packets in the previous input interface queue have been read out of the previous input interface queue. Before moving on to the next input interface queue, the RX interrupt for the previous input interface queue is cleared. This simple polling technique does not account for packet arrival order, as many packets could have arrived on other interfaces while one input interface queue is being polled, and thus these newly arrived packets on other interfaces may have to wait for a long time before being processed. However, this technique has certain performance advantages due to locality, since all packets arriving on the same input interface typically contain the same link layer header and are likely destined for the same next hop.
In a second typical polling process, for each input interface queue 320k having one or more packets stored in the queue at the time that input interface queue 320k is polled, only one packet is read out of each queue and transferred to other storage locations in the router for further processing each time an input interface queue is polled. In this example, the next input interface queue in the sequence is polled as soon as one pending packet in the previous input interface queue has been read out of the previous input interface queue (assuming that the previous input interface queue has at least one packet pending). This technique tends to be fair between interfaces, but does not necessarily process packets in their arrival order, since the packet arrival rate on one interface may be higher than on other interfaces. Also, this technique has a higher processing overhead due to excessive polling of input interfaces.
Regardless of the specific form of scheduling logic 350 used, when scheduling logic 350 determines that a particular data packet 340i should be processed from a particular input interface queue 320k, scheduling logic 350 transfers the data packet 340i to subsequent portions of the networking device (not shown) for further processing. During this period of packet processing, when a new packet arrives on any interface, the RX interrupt for that interface is set if it is not already set, and the new packet is written into the appropriate input interface queue. Eventually, data packet 340i is written into an output queue 360, at the output of which the data packet 340i is finally transmitted from the networking device on an output interface 370. There may be multiple output interfaces with corresponding output queues, although these are not shown so as not to overcomplicate the present discussion.
A common assumption is that packet processing delay is negligible, and that the outer CPU has enough bandwidth to process packets as rapidly as they arrive on all interfaces. Consequently, care must be taken so that packets are not dropped in their input interface queues while waiting to be processed. This is partly the reason that most congestion control and traffic Quality of Service (“QoS”) mechanisms known to those skilled in the art, such as Weighted Fair Queuing (“WFQ”) and Random Early Detection (“RED”), have focused on managing traffic flows at output queues.
However, with the deployment of new QoS and policy-based networking techniques, packet processing is becoming more complicated. For instance, packet classification and policy-based routing require searching through an Access Control List (“ACL”), which can potentially be very time consuming and processor intensive. As is known to those skilled in the art, flow-based WFQ, on the other hand, may require searching through the queue list to determine the next packet to be sent. Moreover, as is known to those skilled in the art, routing information distribution and route calculation also take more time as the network topology becomes richer and as more complicated routing techniques, such as QoS routing, are deployed.
Thus, the combined increased CPU overhead for packet processing and routing protocols naturally increases the waiting time of packets in their input interface queues. Once an input interface queue is full, a newly arriving data packet will be dropped. As is known to those skilled in the art, packet dropping can significantly change the router behavior and related QoS and congestion control features. For example, when RED is configured, dropping packets from the interface queues can dramatically change the RED behavior. As one solution to avoid this from happening, a router may be configured with a relatively large input interface queue size.
However, when the size of an input interface queue becomes large, the sequence in which input interfaces should be polled, as well as the number of packets that should be processed from each input interface in a given polling round, immediately become an issue. The drawback to the known polling techniques described earlier is that packets stored in the last input interface queue may have to wait for a long time. This situation is unacceptable for the following reasons. First, delay-sensitive packets, such as voice packets, may experience unexpected long delays in their input queues. Second, a long waiting time in the input queue can make complementary congestion control techniques such as WFQ less accurate, since some packets may have passed their virtual departure time before they even start being processed. Third, some packets may wait much longer in their interface queues than other packets. This unfair treatment of packets introduces a large delay variance. Finally, considering the increased number of interfaces and heterogeneous link capacity in a single router platform, packets arriving at a high rate interface may be easily dropped even if the size of the input interface queues is configured to be large.
To solve these and other problems, the present invention provides a iate-based polling congestion control technique, according to which when the CPU on a router enters the polling state, the goal is to average packet delay across input interfaces so as to process the packets in their approximate arrival order irrespective of the interface on which they arrive, thus enabling QoS policies to be more effective. In contrast with existing approaches, the technique according to aspects of the present invention polls input interface queues in a dynamically recalculated sequence that is determined based on the estimated data arrival rate on each input interface. This technique not only avoids long waiting time for some delay-sensitive packets and possible packet drop from the a input interface queue, but also treats all input interfaces fairly with respect to their dynamic data arrival rate. Also, the technique according to aspects of the present invention may be combined with other complementary techniques focusing on output interface queues to significantly reduce the latency for delay-sensitive packets and to avoid packet loss. These features are important to providing end-to-end QoS to voice and video applications. These and other features and advantages of the present invention will be presented in more detail in the following specification of the invention and in the associated figures.