1. Field of the Invention
Embodiments of the present invention relate to schedulers in computer systems. More specifically, embodiments of the present invention relate to the design of a parallel wrapped wave-front arbiter.
2. Related Art
One of the main challenges in designing a large switch is to design a scheduler that provides an efficient matching between input and output ports in every “slot,” where slot is defined as the ratio of the cell size to the line rate (i.e., the transmission time of a cell). As line rates continue to increase and cell sizes remain constant, the slot size (cell time) decreases. As a result, the scheduler has less time to produce the matching for cells arriving on multiple ports. Calculating a schedule for a switch with a large number of ports is further complicated because the computation time grows with the number of ports.
Some schedulers, such as: the PIM scheduler (described by T. Anderson, S. Owicki, J. Saxe, and C. Thacker in “High Speed Switch Scheduling for Local Area Networks,” ACM Trans. Comput. Syst., vol. 11, no. 4, pp. 319-352, November 1993), the iSLIP scheduler (described by N. McKeown in “The iSlip Scheduling Algorithm for Input-Queued Switches,” IEEE/ACM Transaction on Networking, vol. 7, no. 2, April 1993), or the DRRM scheduler (described by H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a Large-Capacity Optical ATM Switch,” Proc. IEEE ATM Workshop '97, Fairfax, Va., May 1998), find the maximal matching by iterative, input/output round-robin arbitration. In each iteration, inputs send access request messages to outputs. The scheduler then grants the requests so that collisions are avoided. Inputs and outputs that are not scheduled in a given iteration get another chance in the next iteration.
In a large switch that supports multi-terabit-per-second throughput, schedulers that use iterative algorithms do not provide sufficient performance. Such schedulers require multiple exchanges of requests and grants, and the bandwidth and time overhead they incur in doing so is simply too large to support high data rates.
Some switch designers have proposed dealing with this problem by pipelining the iterative schemes. For example, the systems described in C. Minkenberg, I. Iliadis and F. Abel, “Low-Latency Pipelined Crossbar Arbitration,” IEEE Global Telecommunications Conference 2004 (GLOBECOM '04), vol. 2, pp. 1174-1179, November 2004 and E. Oki, R. Rojas-Cessa, and H. J. Chao, “A Pipeline-Based Maximal-Sized Matching Scheme for High-Speed Input-Buffered Switches,” IEICE Transactions on Communications, vol. E85-B, no. 7, pp. 1302-1311, July 2002 (hereinafter [Rojas]), are two examples of such pipelined schemes. In these pipelined schemes, a given scheduler includes a number of sub-schedulers that process several sets of cells concurrently such that in every slot, one of the sub-schedulers produces a match. If a switch is sufficiently large, though, these schemes require many sub-schedulers, which increases latency, and makes the decision about which sub-scheduler is going to consider a particular request difficult to make.
Other switch designers have proposed using centralized arbiters. One example of a centralized arbiter is the Wrapped Wave Front Arbiter (WWFA) described by Y. Tamir and H. C. Chi, “Symmetric Crossbar Arbiters for VLSI Communication Switches,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, issue 1, pp. 13-27, January 1993 (hereinafter [Tamir]). Although the WWFA achieves arbiter centralization, the WWFA does not scale well for large switches. For example, assume a 5 Tbps switch with 512 10 Gbps ports and a cell size of 128 bits (i.e., number of ports “N”=512, cell size “L”=128 bits, and line rate “C”=10 Gbps). The scheduling period for a WWFA is NT (where T is the amount of time required to process a “wave” within the WWFA). Since at least one schedule has to be calculated in every slot, NT must be no greater than L/C. This means that T≦L/(NC)=0.2 ns. In other words, the arbiter must process one “wave” that includes N requests in no more than 0.2 ns. This is a problem because a reasonable hardware implementation of transfer elements based on 90 nm technology would require at least T=2 ns per wave. Therefore, the WWFA is unsuitable for a switch of this size.
Hence, what is needed is a switch without the above-described problems.