Networks are widely used to transfer voice, video, and data between various network devices such as telephones, televisions, and computers. Data transmitted through a network is typically segmented into packets and under some network protocols data is segmented into fixed-length cells. For example, Asynchronous Transfer Mode (ATM) protocol requires 53-byte cells, with 5 bytes of each cell designated for a header and 48 bytes of each cell designated for payload. Other network protocols, such as ethernet or Internet protocol, carry data in variable-size packets.
Switches are integral parts of most networks. Switches receive packets from input channels and direct packets to the appropriate output channels of the switch. Typical switches have three components: a physical switch fabric to provide the connections from input channels to output channels, a scheduling mechanism to direct traffic when multiple packets arrive on different input channels destined for the same output channel, and a buffering or queuing mechanism at the switch input or output to accommodate traffic fluctuations without undue packet loss. FIG. 1 is a diagram of a prior art switch 10 that has four input channels 12, 14, 16 and 18 and four output channels 20, 22, 24 and 26. The switch has serial input queues 28, 30, 32 and 36 for each input channel, a crossbar physical switch 38, and a crossbar scheduler 40. The crossbar scheduler receives a signal, referred to as a request, from an input queue. The request dictates the output channel or channels that will receive the queued packet. The scheduler arbitrates between competing requests and sends a signal, referred to as a grant, back to the input buffers that have been selected to deliver a packet.
In switches such as the switch 10 described in reference to FIG. 1, each input queue 28-36 provides requests to the scheduler 40 one at a time on a first-in-first-out (FIFO) basis and the scheduler arbitrates among the four requests received from the four input queues, with a goal of maximizing utilization of the input channels 12-18 and output channels 20-26 of the switch. As a grant is issued to a particular input channel to access a target output channel or channels, a new request is accessible by the scheduler in place of the granted request.
A problem known as head-of-line (HOL) blocking is created when one of the requests at the head of a queue line is a request for an output channel that is not available. HOL blocking is common when a multicast request is made because there is a lower probability that all of the output channels for the multicast request will be available immediately. When a request from a particular input channel is forced to wait until all output channels are available, all of the packets associated with the particular input channel are also forced to wait, thereby slowing the transfer of data from that input channel.
As one remedy to solving HOL blocking problems, parallel input queues have been implemented. Parallel input queues provide a separate FIFO queue for each output channel of the switch, with each queue providing a corresponding request to the scheduler. Referring to FIG. 2, an N input channel by N output channel switch requires N input queues 46 for each input channel for a total of N.sup.2 input queues. With an N.sup.2 scaling factor, the number of input queues connected to the crossbar scheduler 50 may be very high. For example, in a 16.times.16 switch, 256 separate queues are required. In spite of the added complexity, the advantage that the parallel design provides is that, with respect to any one of the input channels, a series of requests for available output channels is not held up by a single request for in-use output channels.
A variety of arbitration techniques can be used with parallel input channels to provide an efficient throughput through a switch. For example, maximum matching algorithms are designed in an attempt to assign output channels to input channels in such a way that a maximum number of transfers occur simultaneously. However, under heavy load conditions, maximum matching algorithms can prevent some requests from being granted, creating a new blocking problem. For example, referring to FIG. 3, input channel 1 is represented as requesting to transfer cells from its output-distributed queue 54 to output channel 1 only, while input channel 2 is requesting to transfer cells from its output-distributed queue 56 to output channels 1 and 2. Under a maximum matching approach, input channel 1 transmits cells to output channel 1 and input channel 2 transmits cells to output channel 2. However, input channel 2 will be blocked from transferring cells destined for output channel 1, since this would require the cell transfer from input channel 1 to output channel 1 to stop, and as a result, only output channel 1 would be utilized. As shown in FIG. 4, sending cells from input channel 2 to output channel 1 causes input channel 1 and output channel 2 to remain idle and does not achieve maximum matching.
Arbitration methods developed to optimize performance of high speed switches utilizing parallel input queues are disclosed in U.S. Pat. No. 5,500,858, entitled "Method and Apparatus for Switching Cells in an Input-Queued Switch," issued to McKeown and in U.S. Pat. No. 5,517,495, entitled "Fair Prioritized Scheduling in an Input-Buffered Switch," issued to Lund et al. Although these arbitration approaches are effective for their intended purpose, they both require that an N.times.N switch have N.sup.2 distinct FIFO input queues. Since there are N.sup.2 distinct FIFO input queues, there will also be N.sup.2 requests delivered to the scheduler. As the number of input and output channels increases, the complexity of providing N.sup.2 input queues and sending N.sup.2 requests to the scheduler becomes costly and difficult to implement.
In addition to the problem of added complexity, the output-distributed queue architecture does not easily support multicast requests, which are more common in network protocols such as ethernet than in network protocols such as ATM. For example, in order to utilize the output-distributed architecture of FIG. 2 to satisfy a multicast request, the cell that is to be multicasted must either be replicated into all of the output channel queues that are indicated by the request or a separate multicast queue must be established in addition to the N.sup.2 queues already present.
As a result of the shortcomings of conventional output-distributed queue architecture, what is needed is a method and apparatus that limit the number of input queues and the complexity of sending requests to a scheduler, while still maintaining fair and efficient scheduling.