§ 1.1. Field of the Invention
The present invention concerns methods and apparatus for fairly servicing queues at input ports of a switch (for switching ATM packets for example).
§ 1.2. Related Art
Since the present invention concerns a packet switch which may be used in a communications network, a brief history of communications networks, and the emergence of packet switching, is introduced in § 1.2.2 below. First, however, circuit switching and its limitations are introduced in § 1.2.1 below.
§ 1.2.1 Circuit Switching
The public switched telephone network (or “PSTN”) was developed to carry voice communications to permit geographically remote people to communicate. Modems then came along, permitting computers to communicate data over the PSTN. Voice and modem communications over the PSTN use “circuit switching”. Circuit switching inherently involves maintaining a continuous real time communications channel at the full channel bandwidth between two (2) points to continuously permit the transport of information throughout the duration of the call. Unfortunately, due to this inherent characteristic of circuit switching, it is inefficient for carrying “bursty” data traffic. Specifically, many services have relatively low information transfer rates—information transfer may occur as periodic bursts. Bursty communications do not require full channel bandwidth at all times during the duration of the call. Thus, when a circuit switched connection is used to carry bursty traffic, available communication bandwidth occurring between successive bursts is simply wasted.
Moreover, circuit switching is inflexible because the channel width is always the same. Thus, for example, a wide (e.g., 140 Mbit/second) channel would be used for all transmissions, even those requiring a very narrow bandwidth (e.g., 1 Kbit/second). In an attempt to solve the problem of wasted bandwidth occurring in circuit switching, multi-rate circuit switching was proposed. With multi-rate circuit switching, connections can have a bandwidth of a multiple of a basic channel rate (e.g., 1 Kbit/second). Although multi-rate circuit switching solves the problem of wasted bandwidth for services requiring only a narrow bandwidth, for services requiring a wide bandwidth, a number of multiple basic rate channels must be synchronized. This synchronization becomes extremely difficult for wide bandwidth services. For example, a 140 Mbit/second channel would require synchronizing 140,000 1-Kbit/second channels. Moreover, multi-rate circuit switching does not solve the inherent inefficiencies of a circuit switch, discussed above, when bursty data is involved.
Multi-rate circuit switching having multiple “basic rates” has also been proposed. Unfortunately, switches for multi-rate circuit switching are complex. Furthermore, the channel bandwidths are inflexible to meet new transmission rates. Moreover, most of the bandwidth might be idle when it is needed. Lastly, multiple basic rate circuit switching includes the inherent inefficiencies of a circuit switch, discussed above, when bursty data is involved.
In view of the above described problems with circuit switching, packet switched communications have become prevalent and are expected to be used extensively in the future. Two (2) communications protocols—TCP/IP and ATM—have become popular. Although one skilled in the art is familiar with the ATM protocol, it is introduced in § 1.2.3.1 below for the reader's convenience.
§ 1.2.2 The Emergence of Packet Switching
In recent decades, and in the past five to ten years in particular, computers have become interconnected by networks by an ever increasing extent—initially via local area networks (or “LANs”), and more recently via LANs, wide area networks (or “WANs”) and the Internet. In 1969, the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense (DoD) deployed Arpanet as a way to explore packet switching technology and protocols that could be used for cooperative, distributed, computing. Early on, Arpanet was used by the TELNET application which permitted a single terminal to work with different types of computers, and by the file transfer protocol (or “FTP”) which permitted different types of computers to transfer files from one another. In the early 1970s', electronic mail became the most popular application which used Arpanet.
This packet switching technology was so successful, that the ARPA applied it to tactical radio communications (Packet Radio) and to satellite communications (SATNET). However, since these networks operated in very different communications environments, certain parameters, such as maximum packet size for example, were different in each case. Thus, methods and protocols were developed for “internetworking” these different packet switched networks. This work led to the transmission control protocol (or “TCP”) and the internet protocol (or “IP”) which became the TCP/IP protocol suite.
§ 1.2.3 High Speed Packet Switched Networks
As just introduced above, there has been a trend from circuit switched networks towards packet switched networks. For example, packet switched communications presently appear to be the preferred mode of communication over a Broadband-Integrated Services Digital Network (of “B-ISDN”) service. Packet switching includes normal packet switching (e.g., X.25) and fast packet switching (e.g., Asynchronous Transfer Mode or “ATM”). Normal packet switching assumes that certain errors at each data link are probable enough to require complex protocols so that such errors can be controlled at each link. Link errors were a valid assumption and concern at one time. However, today data links are very reliable such that the probability of errors being introduced by data links are not longer of any real concern. Hence, fast packet switching is becoming more prominent. One such fast packet switching protocol—ATM—is introduced in § 1.2.3.1 below.
§ 1.2.3.1 The Asynchronous Transfer Mode (ATM) Protocol
Since data links are very reliable and the probability of errors being introduced by data links are not longer of any great concern, ATM fast packet switching does not correct errors of control flow within the network (i.e., on a link-by-link basis). Instead, ATM is only concerned with three (3) types of errors—namely, bit errors, packet loss, and packet insertion. Bit errors are detected and/or corrected using end-to-end protocols. Regarding packet loss and insertion errors, ATM only uses prophylactic actions when allocating resources during connection set-up. That is, ATM operates in a connection-oriented mode such that when a connection is requested, a line terminal first checks whether sufficient resources (i.e., whether sufficient bandwidth and buffer area) are available. When the transfer of information is complete, the resources are “released” (i.e., are made available) by the line terminal. In this way, ATM reduces the number of overhead bits required which each cell and reduces the operations performed at each link in the network between the terminals communicating, thereby permitting ATM to operate at high data rates.
The ATM protocol transfers data in discrete sized chunks called “cells”. The use of fixed sized cells simplifies the processing required at each network node (e.g., switch) thereby permitting ATM to operate at high data rates. The structure of ATM cells is described in more detail below.
Finally, the ATM protocol permits multiple logical (or “virtual”) connections to be multiplexed over a single physical interface. As shown in FIG. 1, logical connections in ATM are referred to as virtual channel connections (or “VCCs”) 110. A VCC 110 is the basic unit of switching in an ATM network. A VCC 110 is established between two (2) end users, through the network. A variable-rate, full-duplex flow of ATM cells may be exchanged over the VCC 110. VCCs 110 may also be used for control signaling, network management and routing.
A virtual path connection (or “VPC”) 120 is a bundle of VCCs 110 that have the same end points. Accordingly, all of the cells flowing over all VCCs 110 in a single VPC 120 may be switched along the same path through the ATM network. In this way, the VPC 120 helps contain network control costs by grouping connections sharing common paths through the network. That is, network management actions can be applied to a small number of virtual paths 120 rather than a large number of individual virtual channels 110.
Finally, FIG. 1 illustrates that multiple virtual paths 120 and virtual channels 110 (i.e., logical connections) may be multiplexed over a single physical transmission path 130.
FIG. 2 illustrates the basic architecture for an interface between a user and a network using the ATM protocol. The physical layer 210 specifies a transmission medium and a signal-encoding (e.g., data rate and modulation) scheme. Data rates specified at the physical layer 210 may be 155.52 Mbps or 622.08 Mbps, for example. The ATM layer 220 defines the transmission of data in fixed sized cells and also defines the use of logical connections, both introduced above. The ATM adaptation layer (or “AAL”) 230 supports information transfer protocols not based on ATM. It maps information between a high layer 240 and ATM cells.
Recall that the ATM layer 220 places data in fixed sized cells (also referred to as a packet). An ATM packet includes a header field (generally five (5) bytes) and payload (or information) field (generally 48 bytes). The main function of the header is to identify a virtual connection to guarantee that the ATM packet is properly routed through the network. Switching and/or multiplexing is first performed on virtual paths and then on virtual channels. The relatively short length of the payload or information field reduces the size required for internal buffers at switching nodes thereby reducing delay and delay jitter.
More specifically, FIG. 3A illustrates an ATM cell 300 having a header 310 as formatted at a user-network interface, while FIG. 3B illustrates the ATM cell 300′ having a header 310′ as formatted internal to the network.
Referring first to the header 310 as formatted at the user-network interface, a four (4) bit generic flow control field 312 may be used to assist an end user in controlling the flow of traffic for different qualities of service. The eight (8) bit virtual path identifier field 314 contains routing information for the network. Note that this field 314′ is expanded to twelve (12) bits in header 310′ as formatted in the network. In both headers 310 and 310′, a sixteen (16) bit virtual channel identifier field 316 contains information for routing the cell to and from the end users. A three (3) bit payload type field 318 indicates the type of information in the 48 octet payload portion 350 of the packet. (The coding of this field is not particularly relevant for purposes of the present invention.) A one (1) bit cell loss priority field 320 contains information to let the network know what to do with the cell in the event of congestion. A value of 0 in this field 320 indicates that the cell is of relatively high priority and should not be discarded unless absolutely necessary. A value of 1 in this field indicates that the network may discard the cell. Finally, an eight (8) bit header error control field 322 contains information used for error detection and possibly error correction as well. The remaining 48 octets 350 define an information field.
Fast packet switching, such as ATM switching, has three (3) main advantages. First, ATM switching is flexible and is therefore safe for future transfer rates. Second, no resources are specialized and consequently, all resources may be optimally shared. Finally, ATM switches permit economies of scale for such a universal network.
Having introduced the ATM protocol, the basic components of an ATM switch, known ATM switches, and the limits of known ATM switches are described in § 1.2.3.2 below.
§ 1.2.3.2 Asynchronous Transfer Mode (ATM) Switches
ATM packets (cells) are routed through a network by means of a series of ATM switches. An ATM switch performs three (3) basic functions for point-to-point switching—namely, (i) routing the ATM cell, (ii) updating the virtual channel identifier (VCI) and virtual path identifier (VPI) in the ATM cell header (Recall fields 314, 314′ and 316), and (iii) resolving output port contention (also referred to as “arbitration” or “scheduling”). The first two (2) functions, namely routing and updating, are performed by a translation table belonging to the ATM switch. The translation table converts an incoming link (input port) and VCI/VPI to an outgoing link (output port) and VCI/VPI. An arbiter is used to resolve output port contention among two or more ATM cells destined for the same output port. The arbiter chooses an ATM cell which “wins” contention (i.e., which is applied to the output port). Other ATM cells contenting for the output port “lose” contention (i.e., they must wait before being applied to the output port).
Switch fabric on which a switch architecture is built can be classified into three (3) types: (i) Banyan network; (ii) Crossbar network; and (iii) Clos network. The “Starlite” switch (See the article A. Huang and S. Knauer, “STARLITE: A Wideband Digital Switch,” Proc. IEEE GLOBECOM'84, pp. 121-125 (December 1984)), Turner's broadcast switch (See the article J. S. Turner, “Design of a Broadcast Packet Switching Network,” IEEE Trans. on Commun., Vol. 36, pp. 734-743 (June 1988)) and Lee's multicast switch (See the article T. T. Lee, “Nonblocking Copy Networks for Multicast Packet Switching,” IEEE J. on Select. Areas in Commun., Vol. 6, pp. 1445-1467 (December 1988)) are the typical multicast ATM switches based on a Banyan network. Those switches have an advantage of a reduced hardware complexity. However, internal path conflict and head of line (HOL) blocking have limited the performance and scalability of those switches.
One of the switches built on crossbar network is the “Knockout Multicast” switch (See the article K. Y. Eng, M. G. Hluchyj, Y. S. Yeh, “Multicast and Broadcast Services in a Knockout Packet Switch,” Proc. of INFOCOM'88, pp. 29-34 (1988)), which utilizes a concentrator in every output port to resolve output contention. Following the “Knockout Multicast” switch, SCOQ (See the article M. H. Guo, R. S. Chang, “Multicast ATM Switches: Survey and Performance Evaluation,” Computer Communication Review, Vol 28, No. 2, pp. 98-131 (April 1998)), MOBAS (See the article H. J. Chao, B. S. Choe, “Design and Analysis of A Large-Scale Multicast Output Buffered ATM Switch,” IEEE/ACM Trans. on Networking, Vol. 3, No. 2, pp. 126-138 (April 1995)), Abacus (See the article H. J. Chao, B. S. Choe, J. S. Park, N. Uzun, “Design and Implementation of Abacus Switch: A Scalable Multicast ATM Switch,” IEEE J. on Select. Areas in Commun., Vol. 15, No. 5, pp. 830-843 (June 1997)), and a growable multicast switch (See the article K. Wang, M. H. Cheng, “Design and Performance Analysis of a Growable Multicast ATM Switch,” Proc. of INFOCOM'97, pp. 934-940 (1997)) were proposed. Crossbar switches can achieve high performance because of output queuing and output contention resolution. The tradeoff is the cost of hardware complexity and speedup required.
Growable packet switch (See the article D. J. Marchok, C. E. Rohrs, R. M. Schafer, “Multicasting in a Growable Packet (ATM) Switch,” INFOCOM'91, pp. 850-858 (1991)) and ring sandwich network (See the article Y. Yang, G. M. Masson, “Broadcast Ring Sandwich Networks,” IEEE Trans. on Computers, Vol 44, pp. 1169-1180 (October 1995)) are multicast ATM switches based on Clos network. In fact, Clos network belongs to multistage interconnection network (MIN) but it only has three (3) stages. Since Clos network can provide multiple paths from an input port to an output port, internal path conflicts are relaxed. However, Clos network-based switches still suffer from head-of-line (HOL) blocking because an output port only accepts one cell in a cell slot, as was the case in Banyan network-based switches.
Existing packet switches, including the above-mentioned multicast switches can achieve Gigabit/sec capacity. Unfortunately, however, few of them provide further scalability to Terabit/sec. Besides switch fabric restraints, queuing strategy and cooperated scheduling scheme have a great impact on switch scalability.
To prevent the ATM cells not winning contention for an output port from being lost, buffering is required. There are three (3) basic buffering strategies—namely, pure input queuing, pure output queuing and central queuing. Pure input queuing provides a dedicated buffer at each input port. Arbitration logic is used to decide which input buffer will be next served. The arbitration logic may be simple (e.g., round robin in which the input buffers are served in order, or random in which the input buffers are served randomly) or complex (e.g., state dependent in which the most filled buffer is served next, or delay dependent in which the globally oldest cell is served next).
Input-queued (IQ) switches have become more attractive because the switch fabric and input memory only need to run as fast as the line rate. An input-queued (IQ) switch with first-in-first-out (FIFO) queues is known to suffer head-of-line (HOL) blocking which limits the throughput to 58.6% (=2−√{square root over (2)}). To overcome HOL blocking, virtual output queues (VOQs) are applied in every switch input together with scheduling algorithms like Longest Queue First (LQF) (See the article N. Mckeown, V. Anantharam, J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” Proc. of IEEE INFOCOM'96, (March 1996)), Oldest Cell First (OCF) (See the article A. Mekkittikul, N. McKeown, “A Starvation-free Algorithm For Achieving 100% Throughput in an Input-Queued Switch,” Proc. of ICCCN96, (1996)), Longest Port First (LPF) (See the article A. Mekkittikul, N. Mckeown, “A Practical Scheduling Algorithm to Achieve 100% Throughput in Input-Queued Switches,” Proc. of IEEE INFOCOM98 (April 1998)) to achieve 100% maximized throughput. To support multicast traffic, TATRA and WBA were proposed for input-queued (IQ) switches (See the article B. Prabhakar, N. Mckeown, R. Ahuja, “Multicast Scheduling for Input-Queued Switches,” IEEE J. on Select. Areas in Commun., Vol. 6, (May 1996)). A combined input output queued (CIOQ) switch has been proposed (See, e.g., S-T. Chuang, A. Goel, N. Mckeown, B. Prabhakar, “Matching Output Queuing with Combined Input/Output-Queued Switch,” IEEE J. on Select. Areas in Commun., Vol. 17, No. 6, pp. 1030-1039 (June 1999)). It has been demonstrated that the CIOQ switch can precisely emulate the output queued (OQ) switch when speedup (S≧2−1/N).
Though input-queued (IQ) switches can support high speed line rate without any speedup in hardware, scheduling complexity of at least O(N2.5) is a big obstacle when input queued (IQ) switches grow to a large size (i.e., a large number N of input or output ports). The reason is that, most scheduling algorithms (See the articles: N. Mckeown, V. Anantharam, J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” Proc. of IEEE INFOCOM96, (March 1996); A. Mekkittikul, N. McKeown, “A Starvation-free Algorithm For Achieving 100% Throughput in an Input-Queued Switch,” Proc. of ICCCN96, (1996); A. Mekkittikul, N. Mckeown, “A Practical Scheduling Algorithm to Achieve 100% Throughput in Input-Queued Switches,” Proc. of IEEE INFOCOM98, (April 1998); S-T. Chuang, A. Goel, N. Mckeown, B. Prabhakar, “Matching Output Queuing with Combined Input/Output-Queued Switch,” IEEE J. on Select. Areas in Commun., Vol. 17, No. 6, pp. 1030-1039 (June 1999); and B. Prabhakar, N. Mckeown, R. Ahuja, “Multicast Scheduling for Input-Queued Switches,” IEEE J. on Select. Areas in Commun., Vol. 6, (May 1996)) proposed for input-queued (IQ) switches employ a centralized scheduler, which needs to collect traffic information from N switch inputs in every cell slot and consumes multiple iterations to determine the final input-output matching. The situation may become more complex under multicast traffic. As scheduling complexity increases with switch size N, an input-queued (IQ) switch using a centralized scheduler has difficulties in growing to a large switch size and terabit/sec capacity. Unfortunately, with input queuing, an ATM cell in the front of the queue waiting for an occupied output channel to become available may block other ATM cells behind it which do not need to wait. This is known as head-of-line blocking. A post office metaphor has been used to illustrate head-of-ling blocking in the book, M. dePrycker, Asynchronous Transfer Mode: Solution for Broadband ISDN, pp. 133-137 (Ellis Horwood Ltd., 1991). In the post office metaphor, people (representing ATM cells) are waiting in a line (representing an input buffer) for either a stamp window (representing a first output port) or an airmail window (representing a second output port). Assume that someone (an ATM cell) is already at the stamp window (the first output port) and that the first person in line (the HOL cell of the input buffer) needs to go to the stamp window (the first output port). Assume further that no one is presently at the airmail window (the second output port) and that the second and third people in line (the ATM cells behind the HOL cell in the input queue) want to go to the airmail window (the second output port). Although the airmail window (the second output port) is available, the second and third people (ATM cells behind the HOL cell) must wait for the first person (the HOL cell) who is waiting for the stamp window (the first output port) to become free. Therefore, as the post office metaphor illustrates, the head-of-line (HOL) cell waiting for an output port to become free often blocks ATM cells behind it which would otherwise not have to wait. Simulations have should that such head-of-line (HOL) blocking decreases switch throughput.
Pure output buffering solves the head-of-line (HOL) blocking problems of pure input buffering by providing only the output ports with buffers. Since the ATM cells buffered at an output port are output in sequence (i.e., first in, first out, or “FIFO”), no arbitration logic is required. In the post office metaphor, the stamp window (first output port) has its own line (first output buffer) and the airmail window (second output port) has its own line (second output buffer).
Although pure output buffering clearly avoids HOL blocking that may occur in pure input port buffering, it does have some disadvantages. Specifically, to avoid cell loss, assuming N input ports, the system must be able to write N ATM cells into any one of the queues (or output buffers) during one cell time (i.e., within 2.8 microseconds, where 2.8 microseconds is (53 bytes*8 bits/byte)/155.52 Mbit/second. Such a high memory write rate is needed because it is possible that each of the ATM cells arriving at each of the input ports will require the same output port. This requirement on the memory speed of the output buffer becomes a problem as the size of the switch (i.e., the number N of input ports and output ports) increases. Accordingly, for some large switches, pure output buffering is not feasible because the speed of the output port buffers would have to be large enough and/or fast enough to handle N cells in each cell slot. Output-queued (OQ) switches, (See e.g., the articles: K. Y. Eng, M. G. Hluchyj, Y. S. Yeh, “Multicast and Broadcast Services in a Knockout Packet Switch,” Proc. of INFOCOM'88, pp. 29-34 (1988); H. J. Chao, B. S. Choe, “Design and Analysis of A Large-Scale Multicast Output Buffered ATM Switch,” IEEE/ACM Trans. on Networking, Vol. 3, No. 2, pp. 126-138 (April 1995); H. J. Chao, B. S. Choe, J. S. Park, N. Uzun, “Design and Implementation of Abacus Switch: A Scalable Multicast ATM Switch,” IEEE J. on Select. Areas in Commun., Vol. 15, No. 5, pp. 830-843 (June 1997); K. Wang, M. H. Cheng, “Design and Performance Analysis of a Growable Multicast ATM Switch,” Proc. of INFOCOM'97, pp. 934-940 (1997); M. R. Hashemi, A. Leon-Garcia, “The Single-Queue Switch: A Building Block for Switches with Programmable Scheduling,” IEEE J. on Select. Areas on Commun., Vol. 15, No. 5, pp. 785-793 (June 1997); and A. K. Choudhury, E. L. Hahne, “A New Buffer Management Scheme for Hierarchical Shared Memory Switches,” IEEE/ACM Transactions on Networking, Vol. 5, No. 5, pp. 728-738 (October 1997)) maximize throughput and optimize latency. Hence, output queued (OQ) switches can provide quality of service (QoS) guarantees. Unfortunately, however, switch fabric and output buffer have to run N (where N is the switch size) times as fast as the line rate, because cells arriving at switch inputs have to be delivered to and stored in output queues in a same cell slot. It may be practical to implement an output queued switch or router with an aggregated bandwidth of several 10 Gbps. However, at this time, building an output-queued (OQ) switch with a large number of ports and fast line rate is impractical because sufficient memory bandwidth to provide N times speedup is not yet available.
Input-output-queued (IOQ) switches are combinations of input-queued (IQ) switches and output-queued (OQ) switches. One of few input-output-queued (IOQ) switch designs is CIOQ switch (See, e.g., the article S-T. Chuang, A. Goel, N. Mckeown, B. Prabhakar, “Matching Output Queuing with Combined Input/Output-Queued Switch,” IEEE J. on Select. Areas in Commun., Vol. 17, No. 6, pp. 1030-1039 (June 1999)). The CIOQ switch in the Chuang article adopted both input queuing and output queuing to provide QoS in input-queued (IQ) switches. As speedup is required in input-queued (IQ) switches for QoS purpose, output queuing is needed to avoid cell loss. The CIOQ switch, in fact, can be classified as an input-queued (IQ) switch. The centralized scheduler sustains an arbitration complexity of O(N2.5). Accordingly, the CIOQ switch is not feasible for a large scale switch.
Central queuing includes a queue not assigned to any inlet (input port) or outlet (output port). Each outlet will select ATM cells destined for it in a first in, first out (or “FIFO”) manner. However, the outlets must be able to know which cells are destined for them. Moreover, the read and write discipline of the central queue cannot be a simple FIFO because ATM cells destined for different outlets are all merged into a single queue. Turning again to the post office metaphor, a single line (central queue) of people (ATM cells) are waiting to visit the stamp window (the first output port) or the airmail window (the second output port). As a window opens up (as an output port becomes available), a server searches the line (central queue) for the next person (ATM cell) needing the available window (requiring the available output port). The server brings that person (ATM cell) to the open window (available output port) regardless of whether the person (the ATM cell) is at the front of the line (HOL). As the post office metaphor illustrates, the central queue requires a complex memory management system given the random accessibility required. Of course, the memory management system becomes more complex and cumbersome when the number of output ports (i.e., the size of the switch) increases.
Thus, conceptually, an ATM switch may include input port controllers for accepting ATM cells from various physical (or logical) links (Recall FIG. 1), a switching fabric for forwarding cells to another link towards their destination, and output port controllers for delivering ATM cells to various physical (or logical) links.