§3.1 Field of the Invention
The present invention concerns switches used in communications networks. More specifically, the present invention concerns scheduling of cells sent through the switching fabric of such a switch.
§3.2 Background Information
The fast growing traffic demand in the Internet requires that packet switches should be simple, fast and efficient. Due to the memory speed limit, most current switches use input queuing (“IQ”) or combined input and output queuing (“CIOQ”), with a bufferless crossbar switching fabric. The scheduler must find a matching between inputs and outputs. Such switches require centralized, sometimes complex, algorithms to achieve good performance, such as maximal (See, e.g., the article, J. G. Dai and B. Prabhakar, “The Throughput of Data Switches with and without Speedup,” Proc. of IEEE INFOCOM (Tel Aviv, Israel, March 2000), incorporated herein by reference.) and maximum weight matching (See, e.g., the article N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transactions on Communications, vol. 47, pp. 1260-1267 (August 1999), incorporated herein by reference.). Maximum weight matching can achieve 100% throughput for any admissible arrival traffic, but it is not practical to implement due to its high complexity. Maximal matching, on the other hand, cannot achieve as high a throughput as maximum weight matching.
A number of practical iterative algorithms have been proposed, such as iSLIP (See, e.g., the article, N. Mckeown, “The iSLIP Scheduling Algorithm for Input-Queued Switches,” IEEE/ACM Transactions on Networking, vol. 7, pp. 188-201 (April 1999), incorporated herein by reference.) and dual round robin matching (“DRRM”) (See, e.g., the article, Y. Li, S. Panwar, and H. J. Chao, “On the Performance of a Dual Round-Robin Switch,” Proc. of IEEE INFOCOM (April 2001), incorporated herein by reference.). iSLIP uses multiple iterations to converge to a maximal matching. DRRM can achieve 100% throughput only under independently and identically distributed (“i.i.d.”) and uniform traffic. Exhaustive match with Hamiltonian walk (“EMHW”) (See, e.g., the article, Y. Li, S. Panwar, and H. J. Chao, “Exhaustive Service Matching Algorithms for Input Queued Switches,” Proc. of IEEE HPSR (Phoenix, Ariz., April 2004), incorporated herein by reference.) has been proved to stabilize the system for any admissible traffic, but it is still centralized and has a complexity of O(logN).
With application specific integrated circuit (“ASIC”) technology, it is now possible to add small buffers at each crosspoint inside the crossbar. This makes the buffered crossbar or combined input and crossbar queueing (“CICQ”) switch a much more attractive architecture since its scheduler is potentially much simpler. Each input (or output) knows the state of all crosspoint buffers to (or from) which it can send (or receive) packets. The input and output schedulers can be independent. First, each input picks a crosspoint buffer to send a packet to. Then, each output picks a crosspoint buffer to transmit a packet from, as shown in FIG. 1. A centralized scheduler is not needed since the processing can be distributed at each input and output. It has been shown that simple algorithms such as round robin at both the inputs and outputs (“RR-RR”) (See, e.g., the article, R. Rojas-Cessa, E. Oki, and H. J. Chao, “On the Combined Input-Crosspoint Buffered Packet Switch with Round-Robin Arbitration,” IEEE Transactions on Communications, vol. 53, pp. 1945-1951 (November 2005), incorporated herein by reference.), or longest queue first at the inputs, and round robin at the outputs (“LQF-RR”) (See, e.g., the article, T. Javidi, R. Magill, and T. Hrabik, “A High Throughput Scheduling Algorithm for a Buffered Crossbar Switch Fabric,” Proc. of IEEE ICC, (Helsinki, Finland, June 2001), incorporated herein by reference.), can provide 100% throughput under uniform traffic. SQUISH and SQUID (See, e.g., the article, Y. Shen, S. S. Panwar, and H. J. Chao, “Providing 100% Throughput in a Buffered Crossbar Switch,” Proc. of IEEE HPSR, (Brooklyn, N.Y., May-June 2007), incorporated herein by reference.) can achieve 100% throughput for any admissible traffic, but these are centralized algorithms which do not scale with the increase in the number of ports due to the communication complexity and latency. Thus, these algorithms are generally not implemented in large scale high-speed switching systems.
In view of the foregoing, it would be useful to improve scheduling in switches, such as crosspoint buffered switches.