In high-speed packet data networks, data traffic is handled by switches that receive data at multiple inputs and transmit data at multiple outputs. Particular outputs typically correspond to particular destinations or routes to destinations. Many switch fabric designs have been developed to deal with a series of challenges such as, higher port counts, faster speeds, and greater attention to individual traffic flows. Switch fabrics are used in many applications, including Internet routers, packet data networks, and storage-area-networks. The requirements and goals of different applications are similar. A packet switch in a packet data network will be used as an example of prior data switches.
Data packets arrive at an input of the packet switch with a request to be transmitted to a particular output of the packet switch. Several challenges are presented to those who design packet switches for increasingly high data rate systems. One challenge is providing sufficient switch throughput given very high data rates. Another challenge is providing some fairness among packets or inputs competing for particular outputs so that each packet or input gets an adequate opportunity to access a requested output. Yet another challenge in some cases is providing weighted fairness so that data considered to have relatively higher priority is given preferential access to requested outputs. The latter two challenges relate to quality of service (“QoS”). QoS refers to performance properties of a network service, possibly including throughput, transit delay, and priority. Some protocols allow packets or data streams to include QoS requirements.
Packet switches typically include a scheduler, or arbiter, that decides which requests for outputs are granted. The terms “scheduler” and “arbiter” will be used interchangeably herein. One prior type of packet switch, referred to as an output queued switch, is illustrated in FIG. 1. Output queued switch 100 includes multiple inputs 102, multiple outputs 106, and multiple output queues 108. Since each output queue can receive all the incoming data, there is no need for a centralized scheduler. More than one input may have data requesting the same output at one time. For example, in FIG. 1, all inputs 102A-102D are requesting output 106A. To compensate for this, the output queued switch queues the data at the outputs 106 in output queues 108 In an attempt to achieve fairness, algorithms such as weighted fair queuing (“WFQ”) or weighted round robin (“WRR”) are applied to the data at the outputs. Although acceptable results are achieved with output queued switches and fairness algorithms in prior smaller systems, these methods do not scale to today's larger systems.
Another class of packet switch is known as an input queued switch with virtual output queues (VOQs). Input queued switches queue data at each input, and do not queue data at the outputs. VOQs are arranged such that each input holds data in a separate queue per output. FIG. 2 is a block diagram of a prior input queued switch 200. Switch 200 includes multiple inputs 202 multiple input queues 204, arbiter 206, crossbar 208, and multiple outputs 210. The crossbar 208 includes one input connection 212 for each switch input and one output connection 214 each switch output 210. The crossbar 208 is configured so that it can physically connect data signals on any the switch inputs 202 to any the switch outputs 210. The number of inputs and outputs may or may not be the same. In a common example, data is segmented into fixed-length cells before being switched. A cell is usually transferred from a switch input to a switch output in one unit time called a cell time. Once each cell time, the arbiter 206 configures the crossbar to make certain input-to-output connections for that cell time. Data traffic often has the characteristic that any inputs can request any outputs at any time. Therefore, multiple inputs may request the same output in the same cell time. The arbiter 206 receives requests for outputs 210 from the input queues 204 and applies an algorithm to determine the configuration of the crossbar 208 each cell time. Usually, a key goal in designing an arbiter for the switch 200 is to achieve a throughput rate of as close to 100% as possible. 100% throughput means that none of the input queues 204 are unstable for non-oversubscribed traffic. Non-oversubscribed traffic is traffic where no input 202 or output 210 receives more than its line rate.
Various algorithms have been developed for arbiter 206 to provide fairness and achieve good throughput. These algorithms include maximum weight matching, maximum size matching, and maximal size matching. “Weight” is an arbitrary assignment of relative priority for a particular data cell. Weight is assigned independently of the packet switching process. Maximum weight matching tries to maximize the instantaneous total weight of a matching by looking at the weights assigned to various input queues. It has been shown to achieve 100% throughput. Maximum size matching merely attempts to make the most number of connections each cell time. Both of these algorithms have proven to be impractical because they are computationally expensive and slow.
One way to address the throughput issue is to run the arbiter at some multiple of the system speed. This is referred to as “speed up”, such as speed up of 1.5 (in which the arbiter, or scheduler, operates at 1.5 times the linerate), or speed up of (in which the arbiter, or scheduler, operates at twice the linerate). This alternative has its own disadvantages, such as additional power consumption. Another limitation of speed up is that the switch may achieve 100% throughput, but a bottleneck simply occurs somewhere else in the system, such as at the traffic manager.
Maximal size matching is easier to implement and can yield acceptable results with minimal speed up. Maximal size algorithms include wavefront arbitration (“WFA”) and wrapped wavefront arbitration (“WWFA”). Such algorithms are discussed in more detail in by Hsin-Chou Chi and Yuval Tamir (Proceedings of the International Conference on Computer Design, Cambridge Mass., pp. 233-238, October, 1991). These arbiters receive and operate on data in the form of a request matrix array. The request matrix represents requests from respective inputs for respective outputs. Both WFA and WWFA arbiters have the disadvantage of being unfair. By their nature, these arbiters consider requests from certain inputs ahead of others in sequence a disproportionate amount of the time. Also, these arbiters grant requests for outputs only once. Thus, inputs whose requests are usually considered later in sequence face a greater likelihood of having their requests refused because a requested output is already assigned. Attempts to improve on the degree of fairness provided by WFA and WWFA have been made. For example, the request matrix is rearranged before being operated on by the arbiter. Conventional methods of matrix rearrangement, such as those described by Hsin-Chou Chi and Yuval Tamir, increase fairness somewhat. However, a significant disadvantage of these schemes is their poor throughput performance under certain benign traffic patterns.
Notice that the existing methods have focused primarily on dividing output port bandwidth equally among contending inputs. However, the increased focus on providing quality of service guarantees requires dividing the output port bandwidth to contending inputs in a weighted fair manner. For example, an output might want to divide its link bandwidth where one input receives ⅔ of the bandwidth and a second input receives ⅓ of the bandwidth.
Thus, there is a need for a switch fabric with a scheduler that (1) achieves good throughput and (2) provides improved quality of service.