1. Field of the Invention
The invention relates generally to the forwarding of data packets through a high bandwidth multiport switch. More particularly, the invention describes a weight-based switch scheduling algorithm for scheduling variable-length data packet streams.
2. Description of the Related Art
Traditional maximal size matching based input queued switch scheduling algorithms do not work very well at high line rates, as they need to schedule at the rate of the smallest packet size in the system. The present invention provides a weight-based and highly parallelizable scheduling algorithm which is stable for various traffic patterns and can offer strong QoS guarantees.
It is instructive to first consider the benefits and drawbacks associated with prior art maximal size and weight based switching algorithms to more fully appreciate the weight-based switching algorithm of the present invention. Although the term “maximal match” (or, alternatively, “maximal matching”) is well understood by those skilled in the art, a definition may be had with reference to papers by N. McKeown et al. and Stiliadis et al., as well as U.S. Pat. No. 5,517,495 to Lund et al. In maximal size matching, a scheduling algorithm attempts to maximize the number of connections made in each cell time, and hence maximize the instantaneous allocation of bandwidth. A drawback associated with the algorithm is that when traffic is non-uniform, tie algorithm cannot sustain very high throughput. This occurs because the algorithm does not consider the backlog of cells in the VOQs, or the time that cells have been waiting in line to be served.
To overcome these drawbacks, a well known maximal size matching algorithm, referred to in the literature as ISLIP, has been developed. The ISLIP scheduling algorithm achieves high throughput (i.e., keeps the backlog low), is starvation free (i.e., does not allow a non-empty virtual output queue (VOQ) to remain unserved indefinitely), and is fast and simple to implement in hardware. Virtual Output queueing is a particular type of buffering strategy used at each input port of an input-queued switch whereby instead of maintaining a single FIFO queue for all cells, each input port maintains a separate queue for each output port of the switch. In this manner, head-of-line blocking problems are eliminated The Islip algorithm is based on the Parallel Iterative Matching algorithm (PIM) developed by DEC Systems Research Center for the 16-port, 1 Gb/s AN2 switch. PIM attempts to quickly converge on a conflict-free maximal match in multiple scheduling iterations, where each scheduling iteration includes the three steps described below. In the PIM approach, all inputs and outputs are initially unmatched and only those inputs and outputs not matched at the end of one scheduling iteration are eligible for matching in the next. The three steps operate in parallel on each output and input and are as follows:
Step 1; Request—each unmatched input sends a request to every unmatched output for which it has a queued cell.
Step 2: Grant—if an unmatched output receives any requests, it grants to one request by randomly selecting a request uniformly over all requests.
Step 3: Accept—if an input receives multiple grants, it accepts one grant by selecting an output randomly from among those outputs from which it receives grants.
By considering only unmatched inputs and outputs, each scheduling iteration only considers connections not made by earlier scheduling iterations. A drawback associated with the PIM approach is that utilizing a random approach is difficult and expensive to implement at high speed: each arbiter must make a random selection among the members of a time-varying set. Second, when the switch is oversubscribed, PIM can lead to unfairness between connections. Further, PIM does not perform well for a single scheduling iteration in that it limits the throughput to approximately 63%, only slightly higher than for a FIFO switch.
Islip overcomes these shortcomings by utilizing a simple variation of a round-robin matching (RRM) algorithm. The Islip approach overcomes two problems in PIM, namely, complexity and unfairness. The Islip algorithm, like PIM, consists of three steps. The three steps of arbitration are:
Step 1: Request—each unmatched input sends a request to every unmatched output for which it has a queued cell.
Step 2: Grant—if an output receives any requests, it chooses the one request that appears next in a fixed, round-robin schedule starting from the highest priority elements. The output notifies each input whether or not its request was granted. In a first iteration, a pointer to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input, if and only if the grant is accepted in Step 3.
Step 3: Accept—if an input receives multiple grants it accepts the one grant that appears next in a fixed, round-robin schedule starting from the highest priority elements.
While Islip offers the advantages of high throughput, starvation free inputs, and easy implementation in hardware, which overcome the disadvantages associated with PIM, both Islip and PIM are members of a class of traditional input queued maximal size switching algorithms which are based on the premise that scheduling is performed at the granularity of the smallest packet size in the network. For IP network applications, where the size of the smallest packet is around 50 bytes, each scheduling step must operate in around 10 nanoseconds for line speeds on the order of 40 Gbps. This time will get smaller as line speeds increase. For a typical present generation high speed switch, the line speed at each port is around 2.5 Gbps (OC-48). As hardware speeds are scaling more slowly than optical line speeds, it becomes increasingly impractical to deploy maximal size matching algorithms to schedule at the granularity of 50 bytes.
One proposed solution to eliminate the problems associated with scheduling at the granularity of 50 bytes as line speeds increase into the terabit range is to utilize envelope scheduling. In this approach a scheduler attempts to schedule as many arriving packets as possible in fixed time intervals where each interval is referred to as an envelope time. In this approach, the scheduler waits for an envelope at the head of a virtual queue to receive all arriving packets in each envelope time. A typical fixed envelope time may be, for example, two microseconds. Given the variability in packet arrivals at an input port an envelope will under certain conditions contain less packets than it could accommodate. This presents a drawback in that the available bandwidth is underutilized whereby a lesser number of packets are transmitted in an envelope time than could other wise be transmitted if the envelope was full.
Another class of scheduling algorithms, referred to as maximum weight matching algorithms, assign a weight for every input-output pair based on some criteria like the size of the VOQ for that pair, or the delay of the head of line packet in that VOQ. One drawback associated with a maximum weight matching is that it is computationally expensive. Several heuristics exist to approximate its weight. One approximation is the Greedy algorithm. In the Greedy algorithm, the unmatched input-output pair of largest weight is repeatedly found and matched. The weight of this matching is at least half of the maximum weight matching. Up until this point maximum weight matching algorithms, such as the Greedy algorithm, have not been considered for a number of reasons: (1) line speeds have been slow enough that it was practical to run scheduling algorithms like Islip at the granularity of the smallest packet size in the system, (2) weight-based schemes require computation of weights and maintenance of state which is more than one bit for every queue (3) the algorithms are inherently sequential in nature, which is undesirable for a hardware implementation.
It is no longer practical to run scheduling algorithms like Islip at the granularity of the smallest packet size in the network. This approach will not scale for the next generation of IP networks running at terabit speeds. While fixed size envelope scheduling is more feasible in such situations, Islip and its variants employing envelope scheduling are either unstable, or have very poor delay properties.
While maximum weight matching algorithms offer a partial solution, they do not readily lend themselves to hardware implementations as they are sequential in nature.
It is noted that the paper by Nick Mckeown, “Scheduling Algorithms for Input-Queued Cell Switches”, discusses a parallel implementation of a maximum weight matching algorithm. However, the paper does not address the parallel implementation of a weight-based technique to large envelopes as disclosed by the present invention.
Accordingly, there remains a need for a parallel implementation of a maximum weight matching algorithm that is similar to Islip and is simple to implement in hardware. Further, the maximum weight matching algorithm should be stable (i.e. good delay properties) and scalable to operate in next generation IP networks running at terabit speeds.