1. Field of the Invention
The present invention relates to a method and apparatus for scheduling transmission of data packets in a packet network. More particularly, the present invention relates to a scheduler that can be used in high-speed traffic management equipment.
2. State of the Art
In packet networks, data is transmitted in variable-length packets from source to destination through network elements such as routers and switches. In general, there is a need for buffering data at these intermediate network elements because there may be a mismatch between offered (bandwidth presented to the network element) and available bandwidth on links attached to these network elements. The handling of packets in and out of these buffers is referred to as Traffic Management.
Part of Traffic Management is a scheduling discipline. This is an algorithm that selects a packet out of a pool of buffered packets to be transmitted. In a packet network where one wishes to guarantee minimum bandwidth and delay to certain traffic flows, or put “firewalls” between competing traffic flows, the choice of an appropriate scheduling discipline is essential. Other equally important components of traffic management, such as congestion control, need to complement the scheduling discipline to avoid the unfair monopolization of buffering resources.
The Internet Engineering Task Force (IETF) has proposed an Integrated Services framework (IETF RFC 1633) which provides the ability to guarantee minimum bandwidth and delay to a traffic flow in an IP network. While deployment of such a service may not be desired, practical or feasible on a wide area network such as the Internet, because of scalability, stability and robustness, it may be a useful tool in smaller networks or in networks that aggregate packet data traffic with other traditional telecom services.
With the advent of Multiprotocol Label Switching (MPLS), there is a genuine opportunity for IP networks to be deployed as a multi-service network. MPLS considerably simplifies per-flow classification in a data path compared to the orginal Integrated Services (IntServ) specification. IntServ is a protocol developed by the IETF to provide Quality of Service (QoS) over the internet. Quality of Service is the idea that transmission rates, error rates, and other characteristics can be measured, improved, and, to some extent, guaranteed in advance of transmission. QoS is of particular concern for the continuous transmission of high-bandwidth video and multimedia information.
In the book “An engineering approach to computer networking”, ISBN 0201634422, S. Keshav describes general principles relating to scheduling operations. One known scheduling discipline is known as Generalized Processor Sharing (GPS). GPS is an ideal (and often unimplementable) work-conserving scheduling discipline that achieves a maximum-minimum weighted fair share (WFS) allocation of bandwidth. Quite a number of other scheduling disciplines have attempted to approximate GPS. The most noteworthy are Weighted Fair Queueing (WFQ), Self-Clocked Fair Queueing (SCFQ), Start-time Fair Queueing also known as Stochastic Fair Queueing (SFQ), Virtual Clock (VC), Worst-Case Weighted Fair Queueing (WF2Q) and DRR (Deficit Round Robin).
The first proposed approximation of GPS was WFQ, which is described in “Analysis and simulation of a fair-queueing algorithm”, by Demers, S. Keshav and S. Shenker in proceedings of ACM SIGCOMM '89, p. 1-12, Austin, Tex., September 1989. This approximation computes, based on a virtual time, the finish times for packets and then sorts packets so that they are sent out with the smallest finish time first. To calculate the finish-times, it keeps track of the equivalent GPS system.
SCFQ, as described by S. J. Golestani in “A self-clocked fair queueing scheme for broadband applications”, in proceedings of IEEE INFOCOM '94, vol. 2, p. 643-646, June 1994, and SFQ, as described by P. Goyal, H. M. Vin and H. Chen in “Start-time fair queueing: a scheduling algorithm for integrated service access”, in proceedings of ACM SIGCOMM '96, August 1996, avoid this latter complexity of WFQ by approximating the way the finish time is computed. Although a Relative-Fairness-Bound has been derived for SCFQ, it has a worst-case latency that is proportional to the number of sessions. SFQ improves on SCFQ by having a bounded worst-case latency.
Virtual Clock (VC), as described by L. Zhang in “Virtual clock: a new traffic control algorithm for packet switching networks”, is an algorithm similar to WFQ, but the finish-time computation is simplified by using the real time instead of a virtual time (tracking the state of the GPS system). Because of this, however, the long-term unfairness of VC is not bounded.
WF2Q, as described by Jon C. R. Bennett and Hui Zhang in “WF2Q; Worst-case Fair Weighted Fair Queueing”, INFOCOM '96, pages 120-128, March 1996, is the highest quality approximation of WFQ, but requires sorting of packets with respect to both start time and finish time. WF2Q+ retains all properties of WF2Q, but without the explicit need to track the state of the GPS system. It provides both firm Relative-Fairness-Bounds and Absolute-Fairness-Bounds.
Each of these algorithms discussed hereinabove requires a way to sort sessions, which is an operation of the order O(log N), with N the number of sessions. In “Implementing scheduling algorithms in high speed networks” by D. Stephens, J. Bennett and H. Zhang, IEEE Journal on Selected Areas in Communications, p. 1145-1158, 17(6), 1999, a framework is discussed for high-speed implementations of various WFQ approximations, including WF2Q+. There, the idea of a logarthmic scale of groups is introduced, where each group holds sessions that have a similar service-interval. However, within each group, there is a sorting mechanism (using calendar queues). In “Hardware-efficient fair queueing architectures for high-speed networks”, Infocom (2), pages 638-646, 1996, Jennifer Rexford, Albert G. Greenberg and Flavio Bonomi followed a similar approach for SCFQ algorithms; however sorting bins are required within each group.
DRR, described by M. Shreedhar and George Varghese in “Efficient fair queueing using Deficit Round Robin”, SIGCOMM '95, pages 231-242, 1995, approximates GPS in a low-cost way. It extends Round-Robin (which is a GPS approximation for fixed-size packets) to work correctly for variable-size packets. DRR, like RR, does not require sorting logic. Also, it does not need to calculate a complex virtual-time function. However, DRR only offers long-term fairness, and has a delay-bound that is proportional to the number of sessions.
Hemant Chaskar and Upamanyu Madhow take an alternative approach in “Fair scheduling with tunable latency: a round robin approach”, http://citeseer.nj.nec.com/cheskar99fair.html, to implement a fair scheduler for fixed-size packets. Starting from Hierarchical Round Robin, a low-cost fair scheduler is derived with properties as good as WF2Q, called Multiclass WRR. Chasker and Madhow also suggest that a variabl;e length packet scheduler can be developed along the same lines as DRR. Hierarchical Deficit Round Robin (HDRR) can be considered as a native packet variant of Multiclass WRR, since the concept of service-groups, a hierarchical round-robin selection, and a mechanism to make sure that faster classes are served within a bounded delay are also introduced.
The Generalised Processor Sharing (GPS) discipline and its packet approximations, Fair Queueing (FQ), Weighted Fair Queueing (WFQ) and variants, are popular scheduling disciplines because of their useful properties with respect to bandwidth, proportional fairness, and delay. However, they are very complex to implement at high speeds. For example, an OC-192c SONET fiber, carrying a single channel of Packet-over-Sonet (PoS) IP packets, can transmit up to 25 Million Packets per second, it all the packets were the smallest 40B IP packets. This means that, in a Worst case situation, the scheduler must make a new scheduling decision every 40 ns.
The complexity for implementing WFQ-like disciplines mainly arises from two aspects: computing the timestamps, and sorting the sessions based on the timestamps. Some GPS approximations such as SCFQ and SFQ succesfully avoid the complex timestamp computation, but all existing solutions require sorting of sessions. Since this is very difficult and/or somewhat expensive to implement in hardware, cheaper and simpler alternatives such as DRR are favored for high-speed hardware designs. Deficit Round Robin does not require a sorting operation, but has no way to tune latency given to individual flows, and does not provide short-term fairness.
There is a need for a packet scheduling method and apparatus which provides both long-term and short-term fairness and at the same time is inexpensive to implement.