There are different types of applications in an integrated service packet network such as an Asynchronous Transfer Mode (ATM) network. Some applications, such as voice or real-time media stream, need to be transmitted with little delay and little delay jitter. Similarly, applications, such as remote log-in or on-line trading, can tolerate only small amounts of delay. Other applications, such as e-mail or FTP, can tolerate longer delays, and therefore do not need to be transmitted within a strict delay or delay jitter constraint. Because of the diverse range of acceptable delays for various applications, it is very important to support different levels of qualities of service (QoS) for these various applications. Also, it is important to allocate bandwidth among the connections in a fair manner so that a connection which is sending high rate or bursty traffic cannot occupy more bandwidth than it should be occupying. Thus, a scheduler should fairly allocate bandwidth among multiple connections according to their weights, so a connection sending traffic having an excessively high rate or bursty traffic will not adversely affect other connections.
In a system where a plurality of virtual connections (VCs) are competing to share a common resource (such as the same input port, output port, etc.), one way to control delay and fairly allocate that resource among the connections is to assign different weights to each VC and configure the system to serve the VCs according to their weights. Weight-based scheduling schemes have received a lot of attention, and many weight-based scheduling algorithms have been proposed, such as:                The General Processor Sharing (GPS) approach (See “A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Single-Node Case,” by Abhay K. Parekh and Robert G. Gallager, published in IEEE/ACM Transactions on Networking, Vol. 1, No. 3, June 1993, p. 344–357),        The Weighted Fair Queuing (WFQ) approach (See the above-mentioned Parekh article),        The Worst Case Weighted Fair Queuing (WF2Q) Approach (See “WF2Q: Worst-case Fair Weighted Fair Queuing,” by J. C. Bennett and H. Zhang, published in Proceedings of IEEE INFOCOM '96, p. 120–128),        The VirtualClock approach (See “VirtualClock: A New Traffic Control Algorithm for Packet Switching Networks,” by L. Zhang, published in Proceedings of ACM SIGCOMM 1990, p. 19–29 and “Leap Forward Virtual Clock: A new Fair Queuing Scheme with Guaranteed Delays and Throughput Fairness,” by Subhash Suir, G. Varghese and Cirish Chandranmenon, published on IEEE INFOCOM'97),        The Self-Clocked Fair Queuing (SCFQ) approach (See “A Self-Clocked Fair Queuing Scheme for Broadband Applications,” by S. J. Golestani, published in Proceedings of IEEE INFOCOM 1994, p. 636–646),        The Delay Earliest Due Date (Delay-EDD) and Jitter Earliest Due Date (Jitter-EDD) approaches (See “Comparison of Rate-Based Service Disciplines,” by H. Zhang and S. Keshav, published in Proceedings of ACM SIGCOMM 1991, p. 113–121), and        The Head-of-the-Line Earliest Due Date (HOL-EDD) approach (See “HOL-EDD: A Flexible Service Scheduling Scheme for ATM Networks,” by M. Vishnu and J. W. Mark, published in Proceedings of IEEE INFOCOM 1996, p. 647–654).The entire disclosures of each of the above-cited publications are incorporated herein by reference.        
The commonality of these algorithms is that each one is based on time stamps that are assigned to each incoming packet or to each packet queue. The packets are then sent according to a time stamp sorting result.
While the GPS approach is very flexible and provides perfect fairness (which thereby would give users widely different performance guarantees), GPS is an idealized approach which is not practical to implement because GPS assumes that the scheduler can serve multiple sessions simultaneously, and further assumes that packet traffic is infinitely divisible. The WFQ approach (which is also called packet-by-packet GPS, or PGPS) and the WF2Q approach are packet-by-packet disciplines which closely approximate the GPS approach. Both WFQ and WF2Q have attractive characteristics with respect to delay and fairness, but are impractical to implement because of the intensive computations that these approaches require. SCFQ and VirtualClock reduce the computation complexity of WFQ and WF2Q using an approximation algorithm to simplify the calculation of the virtual time. Because these two algorithms use the internally generated virtual time to reflect the progress of work in the system (rather than using virtual time generated in the hypothetical GPS system), the performance of these two algorithms is not as good as WFQ and WF2Q. Therefore, a need has existed within the art for a scheduling algorithm that can provide high performance with minimal computation complexity.
A major problem with these time stamp-based algorithms is the computation cost. FIG. 1 shows the basic configuration used in implementing time stamp-based algorithms. As packets arrive, a time stamp is attached thereto. Then, the packets are stored in the buffer. The algorithm needs to calculate virtual time for each packet or for each connection. Then, the algorithm requires that each packet or connection be sorted according to their time stamps in order to serve the packet or connection with the minimum time stamp. If no approximation method is employed, the computation cost for the time stamp-based algorithm is 0(log2N), where N is the number of backlogged queues. Because typical systems have thousands of VCs multiplexed into one link, the resultant high computation cost makes these algorithms impractical to implement in a high-speed system.
Another traditional scheduling algorithm that is often implemented is the Weighted Round Robin (WRR) approach. FIG. 2(a) depicts a conventional WRR algorithm. The backlogged connections 100 (labeled A–F) are connected in a ring. As can be seen, each connection has a weight, W, assigned thereto. A connection with weight 2 will get served twice as often as a connection with weight 1. In the example of FIG. 2(a), the scheduler can serve connection A three times (e.g. dequeue 3 packets from connection A) before moving on to connection B, which gets served twice. Thereafter, connections C, D, E, and F get served once, three times, once, and twice respectively as the scheduler moves from connection to connection. With this conventional WRR algorithm, a high weight connection may block a low weight connection, thereby increasing the traffic delay.
An improved WRR algorithm was proposed in “Weighted Round-Robin Cell Multiplexing in a General-Purpose ATM Switch Chip,” authored by M. Katavenis et al. and published in IEEE J. Select Areas, Commun., Vol. 9, No. 8, p. 1265–1279, October 1991, the entire disclosure of which is incorporated herein by reference.
FIG. 2(b) depicts the improved WRR algorithm. Rather than successively serving each connection by a number of packets equal to that connection's weight before moving on to the next connection, the improved WRR algorithm will move to the next connection after serving a previous connection by one packet. To serve the connections according to their weights, multiple “trips” around the connections may have to be made. As shown in FIG. 2(b), connection A (whose weight is 3) will get served by one packet. After such service, connections A's residual weight will be decremented by one. Next connection B (whose weight is 2) will get served by one packet, and then its residual weight will be decremented by one. This pattern will repeat itself as the scheduler moves from connection to connection, and makes a number of “roundtrips” equal to the highest weight assigned to a connection. Service to a given connection will occur in a roundtrip so long as that connection's residual weight is greater than zero. Once all connections have a residual weight equal to zero, the serving cycle will be reset with each connection being accorded its maximum weight. In FIG. 2(b), there will be 3 roundtrips. In the first roundtrip, the serving order will be ABCDEF. In the second roundtrip, the serving order will be ABDF. In the third roundtrip, the serving order will be AD. Thus, the overall serving pattern for a full serving cycle will be ABCDEF–ABDFAD.
The improved WRR algorithm serves connections more evenly in time, but still maintains the same serving frequencies. This algorithm can be easily implemented in hardware, and the computation cost for the serving position only has 0(1) of complexity. However, because it takes 0(log2N) time to search through the binary tree to find the next VC entry to serve, this scheme is not scalable to a high speed system. Also, the round robin techniques of FIGS. 2(a) and 2(b) are unable to guarantee fairness among different flows if variable length packets are served because the scheduler fails to take into account packet length when serving packets.
The Deficit Round Robin (DRR) technique shown in FIG. 3 addresses the unfairness using a deficit counter. As shown in FIG. 3, a weight W is assigned to each FlowID. Also, a deficit counter (D) is maintained for each FlowID. Initially, D is set equal to W. When a given FlowID is selected for service, the scheduler checks whether the D associated with that FlowID is larger than or equal to the length of the packet at the head of the FlowID's queue. If D is larger than or equal to that packet's length, the packet is served. Thereafter D is decremented by the packet's length, and the scheduler checks the next packet in FlowID's queue. As long as the packet at the head of FlowID has a length less than or equal to D, FlowID will continue to be served. If D drops below the packet's length, then D is incremented by W and the scheduler requeues the current FlowID at the end of the serving queue before proceeding to the next FlowID. The DRR technique shown in FIG. 3 can guarantee fairness in terms of throughput, but it allows burstiness, especially for FlowIDs having a large weight.
U.S. Pat. No. 6,101,193 issued to Ohba, the entire disclosure of which is incorporated herein by reference, discloses a modified version of DRR that seeks to improve short-term fairness. FIG. 4 generally discloses the modified DRR technique of the '193 patent. As shown in FIG. 4, the scheduler maintain two queues: a serving queue and a waiting queue. Backlogged flows are initially placed in the serving queue. The scheduler selects each flow in round robin fashion. The packet at the head of the selected flow is served if the flow's D value is not less than the length of the packet at the head of the flow. When a packet from a flow is served, that flow's D value is decremented by the length of served packet. Thereafter, that flow is moved to the end of the serving queue. If the packet at the head of the selected flow has a length greater than D, then that flow's D value is incremented by W and the flow is moved to the end of the waiting queue. No packet is served at this time. Once the serving queue becomes empty, the scheduler treats the waiting queue as the serving queue and the serving queue as the waiting queue, and the process continues. This modified DRR scheme provides fairness through interleaved service, but it requires the scheduler to constantly check whether a flow's allocated weight has been exhausted before deciding whether to serve a packet.
Time-Wheel is another scheduling approach used in ATM systems. As shown in FIG. 5, several queues 102 are organized in a round topology. Each queue 102 contains zero or more backlogged connections (denoted as A, B, C, D, F, and G). Two time pointers, the time pointer 101 and transmission pointer 103 are employed to control the scheduler. The time pointer 101 moves clockwise from one queue to the next for each cell time. All connections behind time pointer 101 are eligible to be served. The transmission pointer 103, which also moves clockwise, either lags or steps together with time pointer 101. The connections in the queues pointed to by the transmission pointer 103 get served one by one. External controller 104 places the connections into one of the queues.
U.S. Pat. No. 6,041,059 issued to Joffe et al., the entire disclosure of which is incorporated herein by reference, describes an improved Time-Wheel algorithm. The computation complexity of the Time-Wheel algorithm is independent of the number of VCs, and the scheduler can precisely pace the assigned bandwidth described in [i, m] (i cells and m cell time). However, several limitations hinder this approach. The first limitation being that the Time-Wheel scheme cannot serve more than m/i VCs. The second limitation being that fairness cannot be guaranteed when the bandwidth is oversubscribed. Therefore, the Time-Wheel approach also does not provide a scheduling algorithm suitable for high speed systems.
Because WFQ has several analytically attractive characteristics relating to delay and fairness, many simplified WFQ algorithms have been implemented. For example, the article, “A Queue-Length Based Weighted Fair Queuing Algorithm in ATM Networks,” authored by Yoshihiro Ohba, and published in Proceedings of INFOCOM '97, 16th Annual Joint Conference of IEEE Computer & Communications Societies, the entire disclosure of which is incorporated herein by reference, discloses a Queue-Length Based WFQ algorithm (QLWFQ) which maintains one common scheduling queue to hold credits for the VCs.
FIG. 6 depicts this QLWFQ scheme. One scheduling queue 108 queues credits for the connection queues 106. Each connection queue 106 has a weight assigned thereto. Each connection queue also has a length measured by the number of cells stored therein. By virtue of having a credit(j) queued in the scheduling queue 108, connection queue(j) will have a cell dequeued therefrom when credit(j) reaches the head of scheduling queue 108. Thus, each cell time, the scheduler takes the first credit from scheduling queue 108, and serves one cell from the connection queue corresponding to that credit.
Credits are updated in two ways. First, when an incoming cell arrives, a connection(i) is determined for that cell. A credit for connection(i) is added to the scheduling queue 108 if the length of connection queue(i) is less than or equal to the weight assigned to connection queue(i). Second, when an outgoing cell is taken from connection queue(j), the credit for connection queue(j) that is at the head of the scheduling queue is removed. If the length of connection queue(j) after dequeuing a cell therefrom is greater than or equal to the weight assigned to connection queue(j), then a credit for connection queue(j) is added to the back of the scheduling queue.
While the QLWFQ scheme reduces the computation complexity to 0(1), it may cause the head-of-line blocking due to bursty traffic, wherein a large number of credits for the same connection queue are added continuously to the scheduling queue. The other limitation of the QLWFQ scheme is that the scheduling queue must hold credits for all backlogged connections. For each connection, a number of credits up to the value of that connection's weight may be stored in the scheduling queue. When the weight granularity is doubled, the memory to hold the credits for the scheduling queue is doubled. Therefore, QLWFQ is not sufficiently flexible to support a wide range of granularities.
U.S. Pat. No. 6,014,367, issued to Joffe, the entire disclosure of which is incorporated herein by reference, discloses a method that can provide a minimum service rate to each VC on a small selected time scale, with no limitation on the number of VCs it can support. There are two kinds of queues in the scheme, a waiting queue and a serving queue. The waiting queue becomes the serving queue when the present serving queue is empty. Both queues are FIFOs. The connections buffered in the serving queue are served one by one, and the scheduler decides on the basis of credits whether the VCs should reenter the serving queue or be moved to the waiting queue. The credit value is increased by the weight when the connection is moved to the waiting queue, and is decreased when the connection gets served by the scheduler. Under this scheme, the interval time between repeat service to a connection does not depend on its weight, but depends on the backlogged connections in the serving queue. In the high speed ATM network, where there are thousands of connections sharing one link, the latency under this scheme is quite high.
Another rate-based WFQ scheme is disclosed in U.S. Pat. No. 6,130,878 issued to Charny, the entire disclosure of which is incorporated herein by reference. This scheme, which operates in the frequency domain, maintains a counter called “relative error” for each VC. Each cell time, the scheduler searches for the VC with the largest relative error, and serves that VC. Because this method involves sorting, there is a limitation on the number of VCs that can be supported in a high-speed ATM switch.
Yet another WFQ algorithm is disclosed by Chen et al. in Chen et al., “Design of a Weighted Fair Queuing Cell Scheduler for ATM Networks”, Proceedings of Globecom 1998, the disclosure of which is incorporated herein by reference. However, under this scheme for scheduling cells, each connection can only be assigned a weight that is a power of two. As such, the granularity of service provided to the connections is poor.
Therefore, a need exists within the art for a scheduler which can fairly allocate bandwidth among numerous connections without an excessive computation cost or a limit on how many connections the scheduler can support.