In the field of computers and data communications there are numerous situations in which a number of entities require access to a resource and there is a need for scheduling access to the resource. For example, consider the problem of providing per-flow or per traffic-class quality-of-service (QoS) guarantees in packet networks. In order to provide applications such as real-time communications and/or interactive applications on a network one needs to provide to individual flows guaranteed rates, bounded end-to-end delays, restricted packet loss, fairness, etc. This generally requires the ability to provide resource reservation and scheduling at the involved hosts and intermediate nodes.
A packet network comprises a number of nodes connected by data links. At each node is a packet handling device, most typically a router. Each packet handling device may be connected to a number of outgoing data links. Packets destined to be sent out over a particular data link may accumulate at an interface associated with the data link.
The interface determines an order in which packets should be sent out on the data link. In a simple case the interface may have a single FIFO (First In First Out) buffer. In this case the interface simply sends packets out in the same order that they are received. This interferes with providing QoS guarantees because it permits packets which are “urgent” to become enqueued behind a long line of non-urgent packets. It is not possible to guarantee bounded end-to-end delays for packets passing through such an interface. The current Internet is based on a best-effort service model that does not provide any QoS assurances to different applications. This lack of service differentiation has serious impact on what type of applications can be supported end-to-end on the Internet.
More recent router designs permit packets to be classified into a number of different classes. Each class can have its own queue. A scheduler selects packets from the heads of the various queues in an order intended to maintain QoS levels for the packets in the various queues. In general one assigns to each of the queues a service fraction, which may be expressed as a percentage of the bandwidth on the outgoing data connection to which the queue is entitled. The scheduler attempts to schedule the dispatch of packets over the outgoing data connection in such an order that each queue receives the bandwidth to which it is entitled. The scheduler also attempts to allocate any excess bandwidth fairly. Various schedulers have been proposed.
With the proper dimensioning of network resources, the most important performance attributes of a packet-scheduler become its delay and fairness bounds for each flow. Delay bounds are important for a wide range of time-sensitive or real-time services. Fairness bounds are important for providing a sufficient degree of isolation to a flow of packets, so that the service guaranteed to that flow is not affected by the behavior, or misbehavior, of other packet flows sharing the same link. To provide such guarantees it is normally assumed that packet flows have been conditioned using an appropriate traffic shaper, such as a leaky-bucket conditioner, and that policing is in effect at the network edges.
Generalized Processor Sharing (GPS) is an ideal scheduler that provides every flow its guaranteed bit-rate and distributes excess bandwidth fairly among flows according to their relative bandwidth weights. As a result, GPS can provide end-to-end delays and fairness guarantees to packet flows as long as the flows are well behaved. The flows may be made well behaved, for example, by shaping them using leaky-bucket traffic conditioners. GPS works by assigning a distinct queue to each flow (or session), then servicing an infinitesimal amount from each session according to a weighted cyclical schedule. Unfortunately, GPS is unrealizable in practice because it services a small part of a packet at a time. A real scheduler must complete the service of an entire packet from a session before it moves to the next session. The GPS algorithm is described in A. K. Parekh and R. G. Gallager, A Generalized Processor Sharing Approach to Flow Control-The Single Node Case” Proc. IEEE INFOCOM '92, vol. 2, May 1992, pp. 915-24.
In GPS it is assumed that every packet is infinitely divisible. That is, the packets are like a fluid. Assume that N sessions share an outgoing link of capacity r. The relative share of bandwidth reserved by session i is represented by a real number Wi which may be called a “session weight”. The values of weights Wi are chosen such that:
                                                        W              i                                                      ∑                                  j                  =                  1                                N                            ⁢                              W                j                                              ×          r                ≥                  r          i                                    (        1        )            The quantity ρ1 which is given by:
                              ρ          i                =                                            W              i                                                      ∑                                  j                  =                  1                                N                            ⁢                              W                j                                              ×          r                                    (        2        )            is the service share for session i. r is the rate of the outgoing data connection (or the server) and ri is the guaranteed bandwidth reservation of session i. If B(τ,t) is the set of sessions that are backlogged in the interval (τ,t]. Then, under GPS, the service Si(τ,t) offered to a session i that belongs in B(τ,t) is proportional to W1. That is:
                                          S            i                    ⁡                      (                          τ              ,              t                        )                          ≥                                            W              i                                                      ∑                                  j                  ∈                                      B                    ⁡                                          (                                              τ                        ,                        t                                            )                                                                                  ⁢                              W                j                                              ×                      r            ⁡                          (                              t                -                τ                            )                                                          (        3        )            
GPS attains its bandwidth guarantees by servicing an infinitesimal amount from each backlogged session in proportion to each session's reservation. As a result, GPS provides perfect isolation, ideal fairness and low end-to-end session delays. However, because GPS is based on the fluid model, it is unimplementable since a scheduling technique will have to serve packets as a whole.
Packet-by-packet GPS, commonly known as weighted fair queuing (WFQ), is a GPS emulation method. In WFQ packets are transmitted according to their finish order under GPS. WFQ simulates a GPS fluid-model in parallel with an actual packet-based scheduler. The GPS simulation determines a virtual finish time (which is used as a timestamp) for packets arriving to the scheduler. To calculate the virtual finish time, WFQ maintains a virtual time function ν(t). The virtual time function is a piecewise linear function of real time t. Its slope changes depending on the number of backlogged sessions and their service rates.
More precisely, if B(τ,t) represents the set of backlogged sessions in the scheduler during the interval (τ,t), the slope of the virtual time function during the interval (τ,t) is given by:
                                          ∑                          i              =              1                        N                    ⁢                      W            i                                                ∑                          j              ∈                              B                ⁡                                  (                                      τ                    ,                    t                                    )                                                              ⁢                      W            j                                              (        4        )            To simplify things one can normalize the weights W1 such that:
                                          ∑                          i              =              1                        N                    ⁢                      W            i                          =        1                            (        5        )            One can call the normalized values of the weights W1 “shares”. Each session has a share. Each share represents a fraction of the total capacity available which is assigned to the session. In the case of a communication link the total capacity available can be the available bandwidth on the communication link. In the balance of this specification the share of a session is referred to by the symbol Φi. Those skilled in the art will realize that it is a matter of design convenience whether or not to normalize the values used to represent the weights Wi in any particular implementation of the invention. It is the relative magnitudes of the weights Wi which is significant. Equivalent implementations which use non-normalized weights Wi could readily be provided.
With Φi defined as above, the slope of the virtual time function becomes:
                    1                              ∑                          j              ∈                              B                ⁡                                  (                                      τ                    ,                    t                                    )                                                              ⁢                      Φ            j                                              (        6        )            At the arrival of a new packet, the virtual time is calculated. Then, the timestamp TSki associated with the kth packet of session i is calculated as:
                              TS          i          k                ←                              max            ⁡                          (                                                TS                  i                                      k                    -                    1                                                  ,                                  v                  ⁡                                      (                    t                    )                                                              )                                +                                    L              i              k                                      ρ              i                                                          (        7        )            Where Lki is the size of the arrived packet.
To perform scheduling in real-time, WFQ must compute the value of the virtual time function before any packet arrival, so that every arriving packet is assigned the correct virtual finish time (as if it will be departing under GPS). The value of the virtual-time function is impacted by arrivals of packets to empty queues, as well as by departures of packets which result in empty session queues.
A severe problem with WFQ is that the GPS simulation may predict that an undetermined (and possibly large) number of session queues should become empty at the same time. Under GPS many packets can end up having the same virtual finish time. In the worst case, as many as N packets may have the same virtual finish time. Therefore, updating the virtual time function in between two consecutive packet arrivals may incur a large number of computations. In particular, if a link is shared by up to N active sessions, then updating the virtual time can incur a computation on O(N) sessions or queues. The number of active sessions can be very large. For example, there may be tens of thousands or even hundreds of thousands of queues feeding a single data link. This translates into a proportionally large number of computations per packet arrival. This problem is called iterated deletion. The iterated deletion problem is discussed in detail in A. Demers, S. Keshav, and S. Shenkar, Analysis and simulation of a fair queueing algorithm, Internetworking Res. and Experience, vol. 1, 1990. The iterated deletion problem has prevented WFQ from being successfully implemented in practice.
Because of the high complexity associated with simulating the GPS system WFQ has attracted much research over the past decade. Many techniques have been proposed to simplify the virtual time calculations. Some such techniques are Self-Clocked Fair Queuing (SCFQ), VirtualClock (VC), Start-time Fair Queuing (SFQ), Frame-Based Fair Queuing (FFQ), Minimum-Delay Self-Clocked Fair Queuing (MD-SCFQ) and Discrete-Rate (DR) scheduling. In general such simplifying approaches suffer from either a decrease in fairness (flow isolation) or an increase in the delay bound. In addition, some of these techniques fail to adequately address the iterated deletion problem.
There is a need for practical methods and systems for scheduling access to resources which provide fair access to the resource.