1. Field of the Invention
The present invention relates to an improved queuing method for implementing deficit round-robin (DRR) scheduling for high-speed packet switching and routing.
2. Related Art
Queuing and scheduling are two critical function blocks that are used by today's packet switches and routers to support quality of services (QoS). A typical packet switch/router has multiple ports forwarding and receiving data packets. Usually a data packet enters a switch/router from one port and departs from another port. The switch/router ports can each receive and forward data packet simultaneously.
FIG. 1 is a block diagram of a high-level switch/router architecture 100 that includes switch ports 101-108 and switch/router 110. Switch/router 110 includes line cards 111-118 and switch fabric 120. Data packets arrive at the left hand side switch ports 101-104 (i.e., ingress ports) and are provided to the corresponding line cards 111-114. The data packets are then processed by switch fabric 120, and then depart from the right hand side ports 105-108 (i.e., egress ports) via line cards 115-118.
During normal operation, multiple packets may be received from several ingress ports and leave switch/router 110 on one egress port. These packets must be queued in front of the egress port to wait for an opportunity to be forwarded.
FIG. 2 is a block diagram of line card 115, which includes multiple queues 2011-201N for storing data packets received from ingress ports 101-104, and scheduler 210. Each of queues 2011-201N is controlled by scheduler 210. Based on QoS requirements, scheduler 210 selects one of queues 2011-201N to send data packets over the egress port 105.
Different queuing and scheduling algorithms are implemented in switches and routers to meet various QoS requirements. The simplest is the First-In-First-Out (FIFO) scheme where packets are stored in one queue and sent in the same order as they are received. The drawback of the FIFO scheme is that bandwidth is not distributed fairly among all of the traffic flows. A few aggressive flows can seize most of the bandwidth. To solve this problem, a per-flow queue based scheduling algorithm, called the round-robin (RR) scheduling scheme, has been introduced. The idea of round-robin scheduling, in essence, is that traffic from different flows are queued separately in their own queues and the switch/router port scheduler circularly and repeatedly “visits” all the packet queues, and sends one packet from each queue during the visit. In terms of distributing bandwidth among different flows, round-robin scheduling is a fairer solution than FIFO scheduling because each flow can be guaranteed the opportunity to send a packet in each round-robin scheduling cycle.
However, round-robin scheduling has two problems. The first problem is that this scheme cannot differentiate large packets from small packets. When a flow sends a packet ten times larger than packets sent by other flows, this flow uses ten times more bandwidth than the other flows. To be fair, the packet length has to be taken into consideration in the scheduling algorithm. In addition, in real network environments, different flows can have different bandwidth requirements and should not be treated as equal. In other words, each flow is given a weighting factor and the bandwidth should be distributed to all the flows proportional to their weights. Round-robin scheduling is not capable of controlling bandwidth distribution in a weighted manner.
To resolve these issues, different scheduling schemes have been proposed. One popular scheduling scheme is the so-called deficit round-robin (DRR) scheme described by M. Shreedhar and George Varghese, “Efficient Fair Queuing using Deficit Round-robin”, pp. 1-22, October 1995. However, the implementation technique proposed by Shreedhar et al. can only be used in switch routers with either slow port speeds or a small number of queues. When the techniques described by Shreedhar et al. are used on a high-speed port with a large number of queues, packets are forwarded at a speed that is lower than the full line rate.
Deficit round-robin scheduling is based on round-robin scheduling. Suppose that N flows are configured to send traffic on a switch/router port with N queues to receive the incoming data packet traffic. Each flow “i” is allowed to send Quantumi worth of bytes of traffic in each round-robin cycle. Deficit round-robin takes the packet length into consideration when servicing a packet. During the first round-robin scheduling cycle, a packet in queue “i” can be sent if its length is not greater than Quantumi. When a packet is sent, the difference between the packet length and Quantumi is stored in a flow parameter called Crediti. For idle flows, the credit (Crediti) is reset to zero. If the packet length is greater than Quantumi, the packet is not forwarded in the present cycle, but the corresponding flow parameter, Crediti, is increased by Quantumi. During the next cycle of round-robin scheduling, the packet is processed again and is sent if the packet length is less than or equal to Crediti+Quantumi.
The following method is proposed by Shreedhar et al. to implement deficit round-robin scheduling. First, to avoid examining empty queues, a link list of active queues is maintained. The DDR scheduler only considers the queues on the active list. The queue at the head of the link list is processed first. Suppose that queue “i” is at the head of the active queue link list. If the queue head packet in queue “i” has length Pkt_Lengthi no greater than Quantumi+Crediti, the packet is sent and Crediti is updated as Crediti=Quantumi+Crediti−Pkt_Lengthi. After the packet is sent, queue “i” is moved to the tail of the active queue link list (if there are more packets waiting in queue “i”). If Pkt_Lengthi>Quantumi+Crediti, then no packet is sent, Crediti is updated as Crediti=Crediti+Quantumi, and queue “i” is moved to the tail of the active link list. More discussion on the DRR scheme can be found in Shreedhar et al.
The implementation method proposed by Shreedhar et al. is not scalable to forward packets at full line rate when used for high-speed ports with large numbers of queues. The main reason is that this technique can result in time gaps during which packets do not get forwarded. As a result, the switch/router port sends fewer packets than could be supported by the full line rate. For example, assume that there are 64K active flow queues in the active queue link list, and in each queue, the queue head packet length is greater than Crediti+Quantumi. No packet is forwarded while the dequeue machine is constantly visiting each of the 64K queues one by one, updating the associated Crediti parameters and then moving the queues to the tail of the active queue link list. During this process, data packets are read from the queue memory and then written back. If it takes T (seconds) to process one queue, then for a 64K*T (seconds) time interval, the egress throughput is zero. If each data packet length is 40 bytes and T=10 ns (for reading from and writing to the queue RAM block), then 64K*10 ns=0.6 ms. For an OC-48 port, 4.7K packets can be forwarded in 0.6 ms. However, no packets are sent with the example given for the implementation technique of Shreedhar et al.
There are two problems with the implementation proposed by Shreedhar et al. The first problem is that a packet has to be read out of the queue memory before the process can determine whether the packet is eligible to be sent in the current cycle. If the packet is not eligible, the packet must be written back to the queue memory and wait to be considered in the next cycle. During that memory access cycle, no packet can be sent, resulting in egress port bandwidth being wasted. The second problem is that a Head of Line (HOL) blocking condition can occur, because only one queue link list is used. The queue at the head of the link list must be processed even if the corresponding packet is ready to be forwarded in the current cycle.
It would therefore be desirable to have an improved technique for solving the scalability problem of the existing DRR scheme provided by Shreedhar et al. It would further be desirable if this improved technique provides a solution to support line rate forwarding for high-speed ports with large numbers of queues.