Many future applications of computer networks such as distance education, remote collaboration, and teleconferencing will rely on the ability of the network to provide Quality-of-Service (QoS) guarantees. These guarantees are usually in the form of bounds on end-to-end delay of the session, bandwidth, delay jitter (variation in delay), packet loss rate, or a combination of these parameters. Broadband packet networks based on ATM (Asynchronous Transfer Mode) are currently enabling the integration of traffic with a wide range of QOS requirements within a single communication network. QoS guarantees can also be provided in conventional packet networks by the use of proper methods in the packet switches (or routers).
Providing QoS guarantees in a packet network requires the use of traffic scheduling methods in the switches (or routers). The function of a scheduling algorithm is to select, for each outgoing link of the switch, the packet to be transmitted in the next cycle from the available packets belonging to the communication sessions sharing the output link. This selection must be performed such that the QoS guarantees for the individual traffic sessions, such as upper bounds on maximum delay, are satisfied. Implementation of the method may be in hardware or software. Because of the small size of the ATM cell, the scheduling method must usually be implemented in hardware in an ATM switch. In a packet network with larger packet-sizes, such as the current Internet, the method can be implemented in software.
In the following, a "packet" will be referred to in the general sense as a variable size protocol data unit generated by any protocol, and a "cell" as the 53-byte protocol data unit as defined in the ATM standards. As far as the scheduling method is concerned, the cell is a special case of a packet.
A traffic scheduling method should possess several desirable features to be useful in practice:
1. Isolation of flows: The method must isolate an end-to-end session from the undesirable effects of other (possibly misbehaving) sessions. That is, the method must be able to maintain the QoS guarantees for a session even in the presence of other misbehaving flows. Note that isolation is necessary even when policing mechanisms are used to shape the flows at the entry point of the network, as the flows may accumulate burstiness within the network. Here, burstiness refers to the behavior of session traffic where its actual rate of arrival during a specific interval of time is larger than its average rate. Thus, a high burstiness generally implies a large number of packets arriving close together in time, with long idle intervals in between. PA1 2. Low end-to-end delays: The method must provide end-to-end delay guarantees for individual sessions. In particular, it is desirable that the end-to-end delay of a session depends only on its bandwidth reservation, and is independent of the behavior of other sessions. PA1 3. Efficient control of end-to-end delays and buffer requirements through the allocated bandwidth. It must be possible to adjust the per-link queueing delays and buffer requirements in the network nodes by adjusting the bandwidth reserved by an application. This enables applications to trade off the Quality of Service needed with the cost, thus allowing the network to support a wide range of applications with different QoS/cost requirements. PA1 4. Utilization: The method must utilize the link bandwidth efficiently. PA1 5. Fairness: The available link bandwidth must be divided among the connections sharing the link in a fair manner. Two scheduling methods with the same maximum delay guarantee may have significantly different fairness characteristics. An unfair scheduling method may offer widely different service rates to two connections with the same reserved rate over a given interval of time. PA1 6. Simplicity of implementation: The scheduling method must have a simple implementation. In an ATM network, the available time for completing a scheduling decision is very short. At SONET OC-3 speed (approximately 155 Mbits/second), the transmission time of a cell is less than 3 .mu.s. For higher speeds the available time is even less. This forces a hardware implementation. In packet networks with larger packet sizes and/or lower speeds, a software implementation may be adequate, but scheduling decisions must still be made at a rate close to the arrival rate of packets. PA1 7. Scalability: The method must perform well in switches with a large number of connections, as well as over a wide range of link speeds.
Several methods have been proposed in the literature for traffic scheduling in packet switches. In general schedulers can be classified as work-conserving or non-work-conserving. A scheduler is work-conserving if the server is never idle when a packet is buffered in the system. A non-work-conserving server may remain idle even if there are available packets to transmit. A scheduler may, for example, postpone the transmission of a packet when it expects a higher-priority packet to arrive soon, even though it is currently idle. When the transmission time of a packet is short, as is typically the case in an ATM network, however, such a policy is seldom justified. Non-work-conserving methods are also used to control delay jitter by delaying packets that arrive early. Work-conserving schedulers always have lower average delays than non-work-conserving servers and are therefore preferred for most applications.
Examples of work-conserving schedulers include Generalized Processor Sharing (GPS), Weighted Fair Queueing, VirtualClock, Delay-Earliest-Due-Date (Delay-EDD), Weighted Round Robin, and Deficit Round Robin. Examples for non-work-conserving schedulers include Stop-and-Go queueing, Jitter-Earliest-Due-Date and Hierarchial Round Robin. Another classification of traffic schedulers is based on their internal architecture: This classification gives rise to two types of schedulers--sorted-priority and frame-based. Sorted-priority schedulers maintain a global variable--usually referred to as the virtual time--associated with each outgoing link of the switch. Each time a packet arrives or gets serviced, this variable is updated. A timestamp, computed using this variable, is associated with each packet in the system. Packets are sorted based on their timestamps, and are transmitted in that order. VirtualClock, Weighted Fair Queueing, and Delay-EDD follow this architecture.
Two factors determine the implementation complexity of all sorted-priority algorithm: First, the complexity of updating the priority list and selecting the packet with the highest priority is at least O(log V) where V is the number of connections sharing the outgoing link. The second is the complexity of calculating the timestamp associated with each packet; this factor depends heavily on the algorithm. For example, maintaining the virtual time in Weighted Fair Queueing requires the processing of a maximum of V events during the transmission of a single packet, whereas timestamps in VirtualClock can be calculated in constant time, that is O (1).
In a frame-based scheduler, time is split into frames of fixed or variable length. Reservations of sessions are made in terms of the maximum amount of traffic the session is allowed to transmit during a frame period. Hierarchical Round Robin and Stop-and-Go Queueing are frame-based schedulers that use a constant frame size. As a result, the server may remain idle if sessions transmit less traffic than their reservations over the duration of a frame, making them non-work-conserving. In contrast, Weighted Round Robin and Deficit Round Robin schedulers allow the frame size to vary within a maximum. Thus, if the traffic from a session is less than its reservation, a new frame can be started early. Therefore, both Weighted Round Robin and Deficit Round Robin are work-conserving schedulers.
Many different scheduling methods have been proposed to approximate the theoretical scheduling discipline known as Generalized Processor Sharing (GPS). The GPS discipline is defined with respect to a "fluid model," where data transmitted by each session is considered to be infinitely divisible and multiple sources may transmit their data simultaneously through a single physical communication link. This allows tight control of the bandwidth allocated to each session on a link.
Unfortunately, GPS is only a hypothetical scheduling discipline: In practice, the packets transmitted by each session cannot be divided further, and data from multiple sessions can be interleaved only at packet boundaries. Thus, the GPS discipline cannot be implemented in practice in a packet-switched network.
However, the GPS discipline provides a sound theoretical basis for the design of practical scheduling methods. A number of such practical methods have been designed based on GPS. These methods vary in their end-to-end delay bounds (that is, the maximum delays seen by packets from a particular session in the network between its end nodes), the level of fairness achieved in allocating bandwidth to different sessions sharing the same communication links, and the complexity of implementing them in a switch or router. An outline of the GPS scheduling discipline is given below.
Assume that the GPS discipline is used to schedule traffic on an outgoing link of a switch. The share of bandwidth reserved by session i on the outgoing link is represented by a real number .phi..sub.i. Let B(.tau., t) denote the set of sessions that are backlogged during the interval (.tau., t). That is, B(.tau., t) is the set of sessions that have at least one packet in the switch at all times during the interval (.tau., t). If r is the bandwidth capacity of the outgoing link, the service offered to a session i, denoted by W.sub.i (.tau., t), is proportional to .phi..sub.i. That is, ##EQU1## The minimum service that a connection can receive in any interval of time is ##EQU2## where V is the maximum number of connections that can be backlogged in the server at the same time. Thus, GPS serves each backlogged session with a minimum rate equal to its reserved rate at each instant; in addition, the excess bandwidth available from sessions not using their reservations is distributed among all the backlogged connections at each instant in proportion to their individual reservations. This results in perfect isolation, ideal fairness, and low end-to-end session delays. What is meant by end-to-end session delay is the delay experienced by a packet between the time a packet leaves a source and the time it is received at its destination.
A packet-by-packet version of GPS, known as PGPS or Weighted Fair Queueing, was defined in A. Demers, S. Keshav, and S. Shenker, "Analysis and simulation of a fair queueing algorithm," Internetworking: Research and Experience, vol. 1, no. 1, pp. 3-26, 1990. Since our method attempts to overcome a serious shortcoming of Weighted Fair Queueing, this algorithm will be explained first. It can be assumed that each traffic session i sharing the output link controlled by the scheduling algorithm is assigned a value .phi..sub.i corresponding to the reserved bandwidth of the session. The values .phi..sub.i are computed such that the reserved bandwidth of session i on the link is given by ##EQU3## where the denominator computes the sum of the .phi..sub.i values for all the sessions sharing the link.
In the Weighted Fair Queueing algorithm, a GPS fluid-model system is simulated in parallel with the actual packet-by-packet system, in order to identify the set of connections that are backlogged at each instant of time and their service rates. Based on this information, a timestamp is calculated for each arriving packet, and the packets are inserted into a priority queue based on their timestamp values. To accomplish the timestamp calculation, a virtual time function v(t) is maintained by the scheduler. The virtual time v(t) is a piecewise linear function of the real time t, and its slope changes depending on the number of busy sessions and their service rates. More precisely, if B(.tau.,t) represents the set of backlogged connections in the scheduler during the interval (.tau.,t), the slope of the virtual time function during the interval (.tau.,t) is given by ##EQU4##
The term "backlogged" here means that the session has one or more packets buffered in the switch throughout the time interval under consideration.
Upon the arrival of a new packet, the virtual time v(t) must first be calculated. Then, the timestamp associated with the k-th packet of session i is calculated as: ##EQU5## where TS.sub.i.sup.k-1 is the timestamp of the previous packet of session i, L is the size of the k-th packet, and .phi..sub.i is the share of the bandwidth reserved by session i.
A serious limitation of the Weighted Fair Queueing algorithm is its computational complexity arising from the simulation of the fluid-model GPS scheduler that is required for computation of the virtual time v(t). If there is a total of V sessions sharing the outgoing link, a maximum of V events may be triggered in the simulation during the transmission time of a single packet Thus, the time for completing a scheduling decision is O(V). When the number of sessions sharing the outgoing link is large, this computation time can be prohibitive. In particular, the algorithm is difficult to apply in an ATM switch where the transmission time of a cell is small (approximately 2.7 microseconds with 155.5 Mbits/second link speed). The method disclosed here provides results similar to those of Weighted Fair Queueing, but its implementation complexity is O(1).
A method to reduce the complexity of Weighted Fair Queueing, using an approximate implementation of GPS multiplexing, was proposed in J. Davin and A. Heybey, "A simulation study of fair queueing and policy enforcement," Computer Communication Review, vol. 20, pp. 23-29, October 1990, and was later analyzed in S. Golestani, "A self-clocked fair queueing scheme for broadband applications," in Proceedings of IEEE INFOCOM '94, pp. 636-646, April 1994, under the name Self-Clocked Fair Queueing (SCFQ).
In this implementation, the virtual time function v(t) is approximated using the timestamp of the packet currently in service. Let TS.sub.current denote the timestamp of the packet currently in service when a new packet arrives, and let the new packet be the k-th packet of session i. Then, the timestamp of the new packet is calculated as ##EQU6## where TS.sub.i.sup.k-1 is the timestamp of the previous packet of session i, L is the size of the k-th packet, and .phi..sub.i is the share of the bandwidth reserved by session i. This approach reduces the complexity of the method greatly. However, the price paid is the reduced level of isolation among the sessions, causing the end-to-end delay bounds to grow linearly with the number of sessions that share the outgoing link. Recall that the end-to-end session delay is the delay experienced by a packet between the time the packet leaves the source and the time it is received at its destination. Thus, the worst-case delay of a session can no longer be controlled just by controlling its reservation, as is possible in Weighted Fair Queueing. The higher end-to-end delay also affects the burstiness of sessions within the network, increasing the buffer requirements.
Another well-known scheduling method is VirtualClock, described in L. Zhang, "VirtualClock: a new traffic control algorithm for packet switching networks," ACM Transactions on Computer Systems, vol.9, pp.101-124, May 1991. This method provides the same end-to-end delay and burstiness bounds as those of Weighted Fair Queueing with a simple timestamp computation algorithm, but the price paid is in terms of fairness. A backlogged session in the VirtualClock server can be starved for an arbitrary period of time as a result of excess bandwidth it received from the server when other sessions were idle.
What is needed therefore is a scheduling system and method for packet switch network that allows for fairness and low latency for the packets in the network. The system and method should be easily implemented utilizing existing architecture. It should be less complex than existing scheduling methods for such scheduling. The present invention addresses such a need.