The present invention relates to a system for scheduling packets in packet networks and, more particularly, to guaranteeing data transfer delays from data sources to destinations.
FIG. 1 shows a packet network in which a plurality of switches 2 are connected to each other by communication links 8. A number of data sources 4 and destinations 6 are connected to the communication switches 2. From time to time, a network connection is established or torn down from each of these data sources 4 to a corresponding destination. The connection establishment process involves one of the data sources 4 sending a packet including control information that indicates one of the destinations 6 to which it desires the connection and the desired envelope of the data traffic it agrees to send on the connection, along with the desired delay bounds at each of the communication switches 2 on the path to the destination. The above desired connection and envelope are specified in terms of leaky-bucket parameters as disclosed in R. Cruz, xe2x80x9cA Calculus for Network Delay, Part II: Network Analysis,xe2x80x9d IEEE Transactions on Information Theory, pp. 121-141, January 1991. For the tear-down of a connection, the data source sends a packet including control information indicating that the connection needs to be torn down.
When one of the switches 2 in the network receives a data packet indicating that a connection needs to be established, the switch executes a call admission control (CAC) procedure to determine whether or not the delay required by the connection can be guaranteed by the network, If the result of such a procedure in every switch on the path of the connection in the network indicates that the delay can be guaranteed by the network, then the connection is accepted in the network. On the other hand, if the result of such a procedure in at least one of the switches 2 on the path of the connection in the network indicates that the delay cannot be guaranteed by the network, then the connection is not accepted in the network.
The provision of quality-of-service (QoS) guarantees, such as bandwidth, delay, jitter, and cell loss, to applications of widely different characteristics is a primary objective in emerging broadband packet-switched networks. In such networks, packet-scheduling disciplines are necessary to satisfy the QoS requirements of delay-sensitive applications, and they ensure that real-time traffic and best-effort traffic can coexist on the same network infrastructure. Among the scheduling algorithms that have been proposed in literature, two classes of schemes have become popular: those based on generalized processor sharing (GPS) and those based on earliest deadline first (EDF). For a survey of these algorithms, see S. Keshav, An Engineering Approach to Computer Networking. ATM Networks, the Internet, and the Telephone Network, Addison-Wesley, Ithaca, N.Y., 1996; H. Zhang, xe2x80x9cService Disciplines for Guaranteed Performance Service in Packet-Switching Networks,xe2x80x9d Proceedings of the IEEE, pp. 1374-1396, October 1995.
EDF scheduling has been known for many years in the context of processor scheduling as disclosed in C. L. Liu and J. W. Wayland, xe2x80x9cScheduling algorithms for multiprogramming in a hard real time environment,xe2x80x9d Journal of ACM, pp. 46-61, January 1973. Furthermore, it has been more recently proposed as a possible packet-scheduling discipline for broadband networks as disclosed in D. Ferrari and D. Verma, xe2x80x9cA Scheme for Real-Time Channel Establishment in Wide-Area Networks,xe2x80x9d IEEE Jour. Sel. Areas Commun., pp. 368-379, April 1990; D. Verma, H. Zhang, D. Ferrari, xe2x80x9cGuaranteeing Delay Jitter Bounds in Packet Switching Networks,xe2x80x9d Proc. TRICOMM, pp. 35-46, Chapel Hill, N.C., October 1991. The EDF scheduling discipline generally works as follows: each connection i at a switch k is associated with a local delay deadline dik; then an incoming packet of connection i arriving to the scheduler at time t is stamped with a deadline t+dik, and packets in the scheduler are served by increasing order of their deadline.
For a single switch, EDF is known to be the optimal scheduling policy as disclosed in L. Georgiadis, R. Guerin, and A. Parekh, xe2x80x9cOptimal Multiplexing on a Single Link: Delay and Buffer Requirements,xe2x80x9d RC 19711 (97393), IBM T. J., Watson Research Center, August 1994; J. Liebeherr, D. Wrege, and D. Ferrari, xe2x80x9cExact Admission Control for Networks with a Bounded Delay Service,xe2x80x9d IEEE/ACM Trans. Networking, pp. 885-901, December 1996. Optimality is defined in terms of the schedulable region associated with the scheduling policy. Given N connections with traffic envelopes {overscore (A)}i (t) (i=1, 2, . . . , N) sharing an output link, and given a vector of delay bounds {right arrow over (D)}=(d1, d2, . . . , dN), where di is an upper bound on the scheduling delay that packets of connection i can tolerate, the schedulable region of a scheduling discipline xcfx80 is defined as the set of all vectors {right arrow over (D)} that are schedulable under xcfx80. EDF has the largest schedulable region of all scheduling disciplines, and its non-preemptive version (NPEDF) has the largest schedulable region of all the non-preemptive policies. The schedulable region of the NPEDF policy consists of those vectors that satisfy the following constraints:                               L          r                ≤                  d          1                                    (        1        )                                                      L            +                                          ∑                                  i                  =                  1                                N                            ⁢                              xe2x80x83                            ⁢                                                                    A                    _                                    i                                ⁢                                  xe2x80x83                                ⁢                                  (                                      t                    -                                          d                      i                                                        )                                                              ≤          rt                ,                              L            r                    ≤          t          ≤                      d            N                                              (        2        )                                                                    ∑                              i                =                1                            N                        ⁢                          xe2x80x83                        ⁢                                                            A                  _                                i                            ⁢                              xe2x80x83                            ⁢                              (                                  t                  -                                      d                    i                                                  )                                              ≤          rt                ,                  t          ≥                      d            N                                              (        3        )            
where dixe2x89xa6d2xe2x89xa6 . . . xe2x89xa6dN, L is the packet size (if the packet size is variable, then L is the maximum packet size), r is the link rate, and {overscore (A)}i(t)=0 for t less than 0. Within a single node, once the traffic envelopes are known, a 100% link utilization can be achieved (at least in principle) with this characterization.
The difficulties arise in a multi-switch or multi-node network where the traffic envelopes are no longer determined at the inputs of the nodes inside the network, and the interactions that distort the traffic are not easily characterizable. This problem is not peculiar of EDF, but is common to any scheduling discipline. As a general framework to handle the multi-node problem, H. Zhang and D. Ferrari, xe2x80x9cRate-Controlled Service Disciplines,xe2x80x9d Jour. High Speed Networks, pp. 389-412, 1994, propose a class of schemes called rate-controlled service (RCS) disciplines which reshape the traffic at each hop within the network. As schematically shown in FIG. 2, an RCS server 10 has two components: a shaper 12 which reshapes the traffic of each connection and a scheduler 14 which receives packets released by the shaper and schedules them according to a specific scheduling discipline, such as EDF as disclosed in L. Georgiadis, R. Guerin, V. Peris, and K. Sivarajan, xe2x80x9cEfficient Network QoS Provisioning Based on per Node Traffic Shaping,xe2x80x9d IEEE/ACM Trans. Networking, pp. 482-501, August 1996 (xe2x80x9cGeorgiadis et al.xe2x80x9d), who build upon this model and derive expressions for the end-to-end delay bounds in terms of the shaper envelope and scheduling delay at each node. They also show the following useful properties of RCS.
Identical shapers at each switch along the path of a connection i (i.e., shapers having identical shaper envelopes for connection i) produce end-to-end delays that are no worse than those produced by different shapers at each switch. Therefore, for any given connection, identical shapers can be used at each node. This shaper envelope common to all shapers for connection i is denoted as {overscore (E)}i (t).
The end-to-end delay bound for connection i is given by:                                           D            _                    i                =                              D            ⁢                          xe2x80x83                        ⁢                          (                                                                    A                    _                                    i                                ⁢                                  "LeftBracketingBar"                  "RightBracketingBar"                                ⁢                                                      E                    _                                    i                                            )                                +                                    ∑                              k                =                1                                            k                i                                      ⁢                          xe2x80x83                        ⁢                          d              i              k                                                          (        4        )            
where {overscore (D)}i=D({overscore (A)}i∥{overscore (E)}i) denotes the maximum shaper delay, and dik is the bound on the scheduler delay for packets of connection i at the k-th switch on its path. The maximum shaper delay is incurred only once and is independent of the number of nodes on the path. The total scheduler delay is the sum of the individual scheduling delays dik at each node. When EDF scheduling is used together with per-node reshaping (this combination is referred to as RC-EDF), and the delay components in Equation (4) are properly chosen, the same delay bounds as GPS can be achieved.
The above properties combined with Equations (1-3) enable a call admission control (CAC) framework that decides if connections may or may not enter the network while ensuring that end-to-end performance constraints are met. How CAC works is first analyzed for an isolated EDF scheduler, and then the analysis proceeds to the multi-node case.
In a single switch, Equations (2) and (3) immediately lead to a CAC scheme. With RC-EDF, given the traffic envelope {overscore (E)}i (t) (enforced by the shaper) and local delay bound for each of the connections being multiplexed on a link of the switch, the equations are combined into:                                           L            +                                          ∑                                  i                  =                  1                                N                            ⁢                              xe2x80x83                            ⁢                                                                    E                    _                                    i                                ⁢                                  xe2x80x83                                ⁢                                  (                                      t                    -                                          d                      i                                                        )                                                              ≤          rt                ,                  t          ≥                      L            r                                              (        5        )            
where dixe2x89xa7L/r, i=1,2, . . . N.
Equation (5) can be graphically interpreted to yield a simple single-switch CAC scheme. {overscore (E)}i (txe2x88x92di) is the curve obtained by shifting the connection i arrival envelope curve {overscore (E)}i(t) to the right by its delay bound di, and denotes the minimal service Ŝi (t) required by connection i in order to meet its local delay bound. The aggregate service demand of all connections at the scheduler is thus given by       S    ⁢          xe2x80x83        ⁢          (      t      )        =      L    +                  ∑                  i          =          1                N            ⁢              xe2x80x83            ⁢                                    S            ^                    i                ⁢                  xe2x80x83                ⁢                  (          t          )                    
where the term L accounts for the non-preemptive nature of the scheduler). If the aggregate service demand service Ŝ(t) never exceeds the server capacity given by R(t)=rt for txe2x89xa7L/r, the packets can be scheduled such that none misses its deadline. FIG. 3 illustrates the service capacity and service demand curves for a simple example with two leaky-bucket constrained connections. Since the following relationship, S(t)=L+{overscore (E)}1 (txe2x88x92d1)+{overscore (E)}2(txe2x88x92d2)xe2x89xa6R(t) holds for all txe2x89xa7L/r, the two calls can be admitted with guaranteed delay bounds d1 and d2.
The extension of this scheme to the multi-node case is as follows. For an incoming connection i with traffic arrival envelope at the edge of the network {overscore (A)}i(t) and end-to-end delay requirement di, the end-to-end CAC algorithm performs the following steps to determine if the connection can be accommodated in the network:
1. It chooses an appropriate shaper with envelope {overscore (E)}i(t) for the connection, and computes the corresponding delay D({overscore (A)}i∥{overscore (E)}i). The delay computation is described in Georgiadis et al.
2. The k-th switch on the path is assigned a delay bound dik such that             D      ⁢              xe2x80x83            ⁢              (                                            A              _                        i                    ⁢                      "LeftBracketingBar"            "RightBracketingBar"                    ⁢                                    E              _                        i                          )              +                  ∑        k            ⁢              xe2x80x83            ⁢              d        i        k              =                    D        _            i        ·    A  
single-node schedulability check according to the schedulability criterion of Equation (5) is performed (using envelope {overscore (E)}i(t) and delay bound dik) at each switch on the path.
3. The connection is admitted only if every switch on the path can accommodate the connection.
The results in Georgiadis et al. can be used for choosing the shaper envelope and for splitting the total scheduling delay among the schedulers on the path. For leaky-bucket-constrained sources with traffic arrival envelope {overscore (A)}i(t)="sgr"i+xcfx81it, generalized processor sharing performance can be matched by choosing shaper envelope {overscore (E)}i(t)=(L+git,"sgr"i+xcfx81it), and assigning local delay bound dik=L/gi+L/rk, to the k-th switch on the path, where rk is the link rate and gi is the rate allocated to the connection at each switch.
A potentially serious problem is that the implementation of an RC-EDF server, which consists of a traffic shaper and an EDF scheduler, can be quite complex. Without techniques to reduce this complexity, the scheme would be unaffordable in practice for application to current packet switches.
It is an object of the present invention to provide a method and an apparatus to implement a EDF-related packet server of minimum complexity, comprising a shaper and a scheduler, which guarantees a small value of the maximum data transfer delay,to each connection.
The EDF-related packet server (alternately called the RC-EDF server) provides a shaper and a scheduler. In accordance with a first aspect of the invention, the shaper holds and releases packets such that the traffic belonging to each connection exiting the shaper conforms to a pre-specified envelope, while the scheduler at each-scheduling instant selects from among the packets released by the shaper the one to be transmitted next on the outgoing link. To reduce the implementation complexity of both the shaper and the scheduler, the RC-EDF server supports a (relatively small) discrete set of delay classes. Each delay class is associated with a delay value, and this delay value corresponds to the delay guarantee provided by the scheduler to the packets of each connection contained in the delay class.
The shaper provides one shaping structure per delay class (each of such structures is referred to as delay class shaper), each of which shapes all connections corresponding to a specific delay class. Each delay class shaper supports a discrete number of shaping rates, each of which is associated with a first-in-first-out (FIFO) queue. Each shaping rate corresponds to a possible value of the slope corresponding to a piece of a piecewise-linear envelope. Each connection which has at least one packet in the shaper is associated with exactly one of the above-mentioned FIFO queues. The delay class shaper has a two-level hierarchical structure. The lower level of the hierarchy uses FIFO queues, one per rate. At the higher level of the hierarchy, the timestamps associated with each FIFO queue are used to select from among the different FIFO queues for that delay class. Based on the timestamps, the delay-class shaper selects one FIFO queue and sends the first packet of the connection at the head of the selected FIFO to the scheduler.
The scheduler maintains a queue of packets for each delay class. Each packet is associated with a timestamp derived from its release time by the shaper and the delay bounds guaranteed to the connection. A sorter selects from among the packets at the head of the FIFOs the one with the minimum timestamp for transmission on the outgoing link.
In accordance with another aspect of the invention, the EDF-related packet server provides a different shaper from the embodiment according to the first aspect of the invention. The scheduler is identical to the embodiment according to the first aspect of the invention. The shaper provides one shaping structure per delay class (referred to as delay class shaper), each of which shapes all connections corresponding to the delay class. Each delay class shaper supports a discrete number of shaping rates, each of which is associated with a FIFO queue. Each connection that has at least one packet in the shaper could be associated with a multiplicity of the FIFO queues. This is in contrast to the first aspect of the invention where each connection with at least one packet in the shaper is associated with exactly one FIFO queue. The delay class shaper has a two-level hierarchical structure. The lower level of the hierarchy uses FIFO queues, one per rate. At the higher level of the hierarchy, the timestamps associated with each FIFO queue are used to select from among the different FIFO queues for that delay class.
These and other aspects of the invention will become apparent in the ensuing detailed description taken in conjunction with the accompanying figures, which disclose a number of preferred embodiments of the invention.