Fast switching of information, be it samples of analog signals or alphanumeric data, is an important task in a communication network. The network nodes in which lines or transmission links from various directions are interconnected for exchanging information between them are often the cause of delay in the transmission. If much traffic is concentrated in a node, and if in particular most of the traffic passes through only few of the links, increased delays or even loss of information are often encountered. It is therefore desirable to have switching nodes which allow fast routing.
In EP 312628 is described a switching apparatus for interconnecting a plurality of incoming and outgoing transmission links of a communication network, or for exchanging data between incoming and outgoing computer- and workstation connection links. Furthermore, known packet formats are described.
The article “Input vs. output queuing on a space-division packet switch” by Karol et al. in IEEE Global Telecommunications conference, Houston, Tex., December 1986, p 0659-0665 a comparison of the two queuing models is performed.
The article “A 622-Mb/s 8×8 ATM Switch Chip Set with Shared Multibuffer Architecture” by Kondoh et al. in the IEEE Journal of Solid-State Circuits, Vol. 28, No. 7, July 1993, an asynchronous transfer mode switch chip set, which employs a shared multibuffer architecture, and its control method are described.
In “Input Queuing of an Internally Non-Blocking, Packet Switch with Two Priority Classes” by Chen and Guérin, in IEEE Infocom 89 Proceedings, Volume II, April 1989, the concept of input queuing in combination with different packet priorities is contemplated.
An overview over prior art switching technology is given on the Internet page www.zurich ibm.com/Technology/ATM/SWOCPWP, wherein an introduction into the PRIZMA Chip is illustrated. Another source for information about this topic is the publication “A flexible shared-buffer switch for ATM at Gbit/s rates” by W. E. Denzel, A. P. J. Engbersen, I. Iliadis in Computer Networks and ISDN Systems, (0169-7552/94), Elsevier Science B.V., Vol. 27, No. 4, pp. 611-624.
The PRIZMA chip comprises a shared common output buffer has 16 input ports and 16 output ports which provide a port speed of 300-400 Mbit/s. The switch's principle is first to route incoming packets through a fully parallel I/O routing tree and then to queue the routed packets in the output buffer. In addition to this, the chip uses a separation between data (payload) and control (header) flow. Only the payloads are stored in a dynamically shared output buffering storage. With this architecture, head-of-the-line-queuing is avoided. The PRIZMA chip has a scaleable architecture and hence offers multiple expansion capabilities with which the port speed, the number of ports and the data throughput can be increased. These expansions can be realized based on a modular use of the PRIZMA. Also singlestage or multi-stage switch fabrics can be constructed in a modular way.
The PRIZMA chip is especially suited for broadband telecommunications, based on ATM, i.e. the Asynchronous Transfer Mode. However, the concept is not restricted to ATM-oriented architectural environments. ATM is based on short, fixed-length packets, often called cells and is supposed to be applied as the integrated switching and transmission standard for the future public Broadband Integrated Services Digital Network (BISDN). PRIZMA's topology and queuing arrangement for contention resolution employs a high degree of parallelism. The routing function is performed in a distributed way at the hardware level, referred to as self-routing. ATM packets are classified into several packet types, particularly packet types with different payload sizes, and the PRIZMA chip is dedicated to handle packets with a payload up to 64 bytes. However, also packet payloads with 12, 16, 32 or 48 bytes are often to be transported.
The fanout F of a multicast packet is defined to be the number of output ports it is destined to. An arriving input packet is distinguished from the output packets it generates; an input packet with a fanout of F generates F output packets.
In a purely output-queued switch, multicast can be performed almost trivially. Upon arrival of a multicast packet, it is simply duplicated to every output queue it is destined for. However, there is a significant drawback to this approach, as each incoming packet may have to be duplicated up to N times, which is a waste of internal memory bandwidth. This problem can be solved by adopting a shared-memory switch architecture, where the output queues handle only pointers to the actual data stored in a memory shared by all output queues. Thus, the packet data need only be stored once, while the pointer to the data is duplicated. This scheme is also referred to as replication at sending (RAS).
The bandwidth through the shared memory of an output-queued switch must equal N times the individual port speed, which poses significant implementation concerns at high line rates. Because of this, input-queued switches have gained popularity in recent years. The performance limitations of FIFO-queued crossbar-based switches have been largely overcome by applying techniques such as virtual output queuing (VOQ), combined with centralized scheduling to achieve good throughput.
VOQ entails the sorting of incoming packets at the input side based on the packet's destination output. This arrangement is fine for unicast traffic, but does not fit well with multicast; for example, in which queue would one store an incoming multicast packet that has F different destinations? The generally accepted solution is to add an (N+1)-th queue at each input that is dedicated to multicast traffic. This raises two new problems, (a) how to schedule packets from the N multicast queues, and (b) how to integrate multicast with unicast traffic in a fair way.
Concerning multicast data packets, in the classical VOQ-arrangement, a multicast data packet requires special handling. The switching device in this case does not contain any buffer. This means that in order to send a multicast data packet, all output ports where this data packet goes to have to be free. This results in additional complexity in the routing controller: it has to recognize that this is a multicast data packet, then has to ensure that no other input adapter sends a data packet to one of the output ports where the multicast data packet goes to, and then has to grant the sending allowance to the input adapter which will send the multicast packet and finally has to set the path in the switching device. As long as the routing controller is a simple logic, this is doable, but at the moment routing controllers become pipelined and run with sophisticated algorithms which try to ensure best fairness and handling of priorities, this becomes a really complex task. A known current practice is to build separate multicast-queues where all adapters put their multicast data packets. This totally disrupts the relation between non-multicast and multicast traffic, which is hence considered a suboptimal solution. It is not possible to send two multicast data packets, one from a first input adapter and one from a different input adapter, when there is at least one output port which overlaps in the destinations of these two multicast data packets. This severely disrupts the throughput performance.
In “Queueing Strategies for Multicast Packet Switching”, IEEE Globecom '90, San Diego Calif., USA, 1990, pp. 1431-1437, Hui and Renner provide an overview of multicast scheduling strategies for input-buffered switches with FIFO queues. They distinguish between unicast service and multicast service, the former entailing sequential transmission to each of a multicast packet's destinations, while in the latter case multiple destinations can be served at once. They also introduce the notion of fanout splitting for the multicast service case; this means that a multicast packet may be transmitted over the course of multiple timeslots, until all of its destinations have been served. The opposite is one-shot scheduling, where all destinations have to be served simultaneously. Fanout splitting has a clear advantage over one-shot scheduling becausehead-of-line blocking is reduced. Multicast service is clearly preferable to unicast service for a multitude of reasons, the main one being that it is wasteful of bandwidth towards the switch because a packet with a fanout of F must be transmitted F times across the input link, resulting in poor utilization and large delays. The authors come to the conclusion that an FCFS service with fanout splitting is best in terms of throughput, delay, and fairness.
Despite the advances made with respect to multicast scheduling, the input-queued architectures presented above face several problems.
The FIFO organization of the multicast queue is prone to head-of-line blocking, in particular under heavy multicast load. Although concentrating algorithms try to minimize the impact by quickly serving entire head-of-line packets, they can never eliminate it. If it were possible to somehow apply the VOQ arrangement used for unicast traffic also to multicast traffic, head-of-line blocking would be eliminated completely.
Integration of multicast and unicast traffic, and fairness between multicast and unicast traffic are issues that have gone largely untouched. Ideally, no artificial distinction should be made between the two types of traffic with respect to either the queuing discipline or the scheduling discipline in order to ensure fairness among all input/output pairs regardless of traffic type.
Packet switches that rely solely on output queuing are not well scalable to high data rates because of the high memory bandwidth requirement. Implementations that use a high degree of parallelism can achieve the desired bandwidth, but limit the amount of memory that can be integrated on a single chip, thus potentially leading to high packet loss rates and highly traffic-dependent performance.