1. Field of the Invention
This invention pertains generally to traffic scheduling systems for packet-switched communications networks and, more particularly, to a system and method for carrying out such traffic scheduling providing low end-to-end delay bounds, low buffer requirements, and fairness.
2. Description of the Background Art
As computer networks become more and more widespread, methods for controlling traffic efficiently in these networks are becoming more important. Early data networks were based on circuit switching where the peak bandwidth demand of a communication session was allocated to it for the entire duration of the session. When the session traffic was bursty, that is when the peak rate of the session exceeded its average rate, circuit switching resulted in under-utilization of the network resources. Packet switching was developed to overcome this disadvantage, thus improving the network utilization for bursty traffic.
Packet switched networks dynamically allocate bandwidth according to demand. By segmenting the input flow of information into units called "packets," and processing each packet as a self-contained unit, packet switched networks allow scheduling of network resources on a per-packet basis. This enables multiple sessions to share the network resources dynamically by allowing their packets to be interleaved across the communication network.
Along with the introduction of packet switched networks came a desire for Quality of Service (QoS) guarantees. Many future applications of computer networks such as distance education, remote collaboration, and teleconferencing will rely on the ability of the network to provide QoS guarantees. These guarantees are usually in the form of bounds on end-to-end delay of the session, bandwidth, delay jitter (variation in delay), packet loss rate, or a combination of these parameters. Broadband packet networks based on ATM (Asynchronous Transfer Mode) are currently enabling the integration of traffic with a wide range of QoS requirements within a single communication network. QoS guarantees can also be provided in conventional packet networks by the use of proper methods in the packet switches (or routers).
Providing QoS guarantees in a packet network requires the use of traffic scheduling methods in the switches (or routers). The function of a scheduling method is to select, for each outgoing link of the switch, the packet to be transmitted in the next cycle from the available packets belonging to the communication sessions sharing the output link. This selection must be performed such that the QoS guarantees for the individual traffic sessions, such as upper bounds on maximum delay, are satisfied. Implementation of the method may be in hardware or software. Because of the small size of ATM cells, the scheduling method must usually be implemented in hardware in an ATM switch. In a packet network with larger packet-sizes, such as the current Internet, the method can be implemented in software.
Several methods have been proposed for traffic scheduling in packet switches. In general, schedulers can be classified as work-conserving or non-work-conserving. A scheduler is work-conserving if the server is never idle when a packet is buffered in the system. A non-work-conserving server may remain idle even if there are available packets to transmit. A server may, for example, postpone the transmission of a packet when it expects a higher-priority packet to arrive soon, even though it is currently idle. When the transmission time of a packet is short, as is typically the case in an ATM network, however, such a policy is seldom justified. Non-work-conserving methods are also used to control delay jitter (variation in delay) by delaying packets that arrive early. Work-conserving servers always have lower average delays than non-work-conserving servers and are therefore preferred for most applications.
Examples of work-conserving schedulers include Generalized Processor Sharing (GPS), Weighted Fair Queueing, VirtualClock, Delay-Earliest-Due-Date (Delay-EDD), Weighted Round Robin, Deficit Round Robin, and Hierarchical-Round-Robin. Examples of non-work-conserving schedulers include Stop-and-Go queueing, and Jitter-Earliest-Due-Date.
Another classification of traffic schedulers is based on their internal architecture: This classification gives rise to two types of schedulers--sorted-priority and frame-based. Sorted-priority schedulers compute a time-stamp, associated with each packet in the system. Packets are sorted based on their time-stamps, and are transmitted in that order. VirtualClock, Weighted Fair Queueing, and Delay-EDD follow this architecture. To aid in the computation of time-stamps, sorted-priority schedulers usually maintain a global function that keeps track of the progress of work in the system. This global function is often referred to as "virtual time." Two factors determine the implementation complexity of all sorted-priority methods. The first is the complexity of updating the priority list and selecting the packet with the highest priority is at least O(log V) where V is the number of connections sharing the outgoing link. The second is the complexity of calculating the time-stamp associated with each packet; this factor depends heavily on the method. For example, maintaining the virtual time in Weighted Fair Queueing requires the processing of a maximum of V events during the transmission of a single packet, whereas time-stamps in VirtualClock can be calculated in constant time, that is O (1).
In a frame-based scheduler, a virtual time is not calculated. Frame-based schedulers split time into frames of fixed or variable length. Reservations of sessions are made in terms of the maximum amount of traffic the session is allowed to transmit during a frame period. Hierarchical Round Robin and Stop-and-Go Queueing are frame-based schedulers that use a constant frame size. As a result, the server may remain idle if sessions transmit less traffic than their reservations over the duration of a frame, making them non-work-conserving. In contrast, Weighted Round Robin and Deficit Round Robin schedulers allow the frame size to vary within a maximum. Thus, if the traffic from a session is less than its reservation, a new frame can be started early. Therefore, both Weighted Round Robin and Deficit Round Robin are work-conserving schedulers.
Many different scheduling methods have been proposed to approximate the theoretical scheduling discipline known as Generalized Processor Sharing (GPS). The GPS discipline is defined with respect to a "fluid model," where data transmitted by each session is considered to be infinitely divisible and multiple sources may transmit their data simultaneously through a single physical communication link. This allows tight control of the bandwidth allocated to each session on a link. Unfortunately, GPS is only a hypothetical scheduling discipline. In practice, the packets transmitted by each session cannot be divided further, and data from multiple sessions can be interleaved only at packet boundaries. Thus the GPS discipline cannot be implemented in practice in a packet-switched network. However, the GPS discipline provides a sound theoretical basis for the design of practical scheduling methods. A number of such practical methods have been designed based on GPS. These methods vary in their end-to-end delay bounds (that is, the maximum delays seen by packets from a particular session in the network between its end nodes), the level of fairness achieved in allocating bandwidth to different sessions sharing the same communication link, and the complexity of implementing them in a switch or router. An outline of the GPS scheduling discipline is given below, before describing previous methods based on GPS.
Assume that the GPS discipline is used to schedule traffic on an outgoing link of a switch. The share of bandwidth reserved by session i on the outgoing link is represented by a real number .phi..sub.i. Let B(.tau.,t) denote the set of sessions that have at least one packet in the switch at all times during the interval (.tau., t). If r is the bandwidth capacity of the outgoing link, the service offered to a connection i denoted by W.sub.i (.tau.,t) is proportional to .phi..sub.i. That is: ##EQU1## The minimum service that a session can receive in any interval of time is: ##EQU2## where V is the maximum number of sessions that can be backlogged in the server at the same time. Thus, GPS serves each backlogged session with a minimum rate equal to its reserved rate at each instant; in addition, the excess bandwidth available from sessions not using their reservations is distributed among all the backlogged sessions at each instant in proportion to their individual reservations. This results in perfect isolation, ideal fairness, and low end-to-end session delays. Recall that the end-to-end session delay is the delay experienced by a packet between the time it leaves a source and the time it is received at its destination.
Unfortunately, as indicated above, GPS is only a theoretical system and not directly implementable in practice. Therefore, systems have been designed to approximate the GPS system as closely as possible. An example is a packet-by-packet version of the GPS method, known as PGPS or Weighted Fair Queueing as defined in A. Demers, S. Keshav, and S. Shenker, "Analysis and Simulation of a Fair Queueing Method," Internetworking: Research and Experience, Vol. 1, No. 1, pp. 3-26, 1990. Unfortunately, there is a serious shortcoming with the Weighted Fair Queueing method which will be apparent after the following brief discussion of the Weighted Fair Queueing method.
In the Weighted Fair Queueing method, we assume that each traffic session i sharing the output link controlled by the scheduling method is assigned a value .phi..sub.i corresponding to the reserved bandwidth of the session. The values .phi..sub.i are computed such that the reserved bandwidth of session i on the link is given by ##EQU3## where the denominator computes the sum of the .phi..sub.i values for all the sessions sharing the link.
In the Weighted Fair Queueing method, a GPS fluid-model system is simulated in parallel with the actual packet-by-packet system, in order to identify the set of connections that are backlogged in the GPS system and their service rates. Based on this information, a time-stamp is calculated for each arriving packet, and the packets are inserted into a priority queue based on their time-stamp values. To accomplish the time-stamp calculation, a virtual time v(t) is maintained by the scheduler. This virtual time v(t) is a piece-wise linear function of the real time t, and its slope changes depending on the number of busy sessions and their service rates. More precisely, if B(.tau.,t) represents the set of backlogged connections (by "backlogged" we mean that the session has one or more packets buffered in the switch throughout the time interval under consideration) in the scheduler during the interval (.tau., t), the slope of the virtual clock function during the interval (.tau.,t) is given by: ##EQU4##
On the arrival of a new packet, the virtual time v(t) must first be calculated. Then, the time-stamp TS.sub.i.sup.k associated with the k-th packet of session i is calculated as: ##EQU5## where TS.sub.i.sup.k-1 is the time-stamp of the previous packet of session i, L is the size of the k-th packet, and .phi..sub.i is the share of the bandwidth allocated to session i.
A serious limitation of the Weighted Fair Queueing method is its computational complexity arising from the parallel simulation of the fluid-model GPS scheduler that is required for computation of virtual time v(t). If there is a total of V sessions sharing the outgoing link, a maximum of V events may be triggered in the simulation during the transmission time of a single packet. Thus, the time for completing a scheduling decision is O(V). When the number of sessions sharing the outgoing link is large, this computation time can be prohibitive. In particular, the method is difficult to apply in an ATM switch where the transmission time of a cell is small (approximately 2.7 microseconds with 155.5 Mbits/second link speed).
A method to reduce the complexity of Weighted Fair Queueing, using an approximate implementation of GPS multiplexing, was proposed in J. Davin and A. Heybey, "A Simulation Study of Fair Queueing and Policy Enforcement," Computer Communications Review, Vol. 20, pp. 23-29, Oct. 1990, and was later analyzed in S. Golestani, Proceedings of INFOCOM '94, pp. 636-646, IEEE, April 1994 under the name "Self-Clocked Fair Queueing" (SCFQ). In this implementation, the virtual time function v(t) is approximated using the time-stamp of the packet currently in service. Let TS.sub.current denote the time-stamp of the packet currently in service when a new packet arrives, and let the new packet be the k-th packet of session i. Then, the time-stamp of the new packet is calculated as: ##EQU6## where TS.sub.i.sup.k-1 is the time-stamp of the previous packet of session i, L is the size of the k-th packet, and .phi..sub.i is the share of the bandwidth reserved by session i. This approach reduces the complexity of the method greatly. However, the price paid is the reduced level of isolation among the sessions, causing the end-to-end delay bounds to grow linearly with the number of sessions that share the outgoing link. This can be a very problematic limitation of this method because the worst-case delay of a session can no longer be controlled just by controlling its reservation, as is possible in Weighted Fair Queueing. The higher end-to-end delay also affects the burstiness of sessions within the network, increasing the buffer requirements. Here, burstiness refers to the behavior of session traffic where its actual rate of arrival during a specified interval of time is larger than its average rate. Thus, a high burstiness generally implies a large number of packets arriving close together in time, with long idle intervals in between.
Another well-known scheduling method is VirtualClock, described in L. Zhang, "VirtualClock: a new traffic control algorithm for Packet switching networks," ACM Transactions on Computer Systems, vol. 9, pp. 101-124, May 1991. This method provides the same end-to-end delay and burstiness bounds as those of Weighted Fair Queueing with a simple time-stamp computation method, but the price paid is in terms of fairness. A backlogged session in the VirtualClock server can be starved for an arbitrary period of time as a result of excess bandwidth it received from the server when other sessions were idle.
What is needed is a method and apparatus that calculates and maintains a global parameter for keeping track of the progress of the system in a distinct and more accurate manner than the virtual time based methods described above. This global parameter should provide implementation complexity similar to that of Self-Clocked Fair Queueing, but still maintain the delay bounds of Weighted Fair Queueing. What is further needed is a method and apparatus that provides for maximum fairness among all incoming connections to the system.