Switches are an important factor in determining performance of any modern interconnect network designed for high-performance computing and signal processing. Over the past years a number of switch architectures have made their mark in packet-switched applications. However, switches still represent a bottleneck in high-performance low-latency networks.
In general, a number of switch architectures have been proposed and implemented for packet switched networks. The major factors that distinguish the different switch architectures are the organization of the storage medium and the switch fabric. The following is an explanation of the space division switches, shared memory switches and shared medium switches.
In space division switches multiple concurrent paths are established from the inputs to the outputs, each with the same data rate as an individual line. Examples of space division switches include crossbar fabrics and Banyan-based space division switches. Switch architectures that used crossbars are simple to build, but are not scalable and suffer from contention when several inputs try to connect to a single output. When two or more packets need to access the same output port the need for buffering arises. There are several possibilities for the location of buffer in a crossbar switch, two common ones being-at the cross points of the switching array and at the input of the switching array. When the buffers are located at the cross points of the switch, the buffer memories are required to run at a minimum of twice the line speed. In this situation, the buffer memory requires a much larger amount of real estate than the switching array itself, and combining these components on the same ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array) would limit the size of the switching fabric that can be implemented on a single chip.
Input queuing consists of placing a separate buffer at each input to the switch. This configuration is useful in the sense that it separates the buffering and switching functions and proves very desirable from the chip implementation point of view. It is important that the buffers are constructed to prevent head-of-line blocking. This is accomplished by passing relevant information about each packet present within the buffer space to the scheduler or by making the memories run faster than the link rate. Combined input-output buffered architectures that avoid head-of-line blocking are commonly used in switches, because they help ease contention and also provide a way of implementing Quality of Service (QoS).
Banyan-switches are based on 2×2 basic switches that have been built into a binary tree topology. To make the routing algorithm easy, the destination address is in terms of bits (b0, b1, . . . bn). Each switching element decides its state depending upon the value of the corresponding bit starting from the most significant bit of the address. If it is 0, then the packet is switched to the upper port, otherwise it is switched to the lower port. This makes the switch fabric self-routing.
In the situation of output conflicts, the packets have to be buffered and then serviced when the appropriate port is free. An alternative to this solution is to use input buffers along with a sorter network before the banyan switch fabric. The sorter network sorts the input packets and presents the Banyan switch fabric with permutations that are guaranteed to pass without conflicts. Sorting is generally done using a ‘Batcher sorter’ and the resulting switch fabric is called a Batcher-Banyan switching fabric. An example of such switch architecture is the Sunshine network (E. E. Witt, A Quantitative Comparison of Architectures for ATM Switching Systems, 1991).
In a shared buffer switch packets arriving on all input lines are multiplexed into a single stream that is fed to the common memory for storage. Internally in the memory, packets are organized into separate output queues, one for each output line. An example of the shared buffer architecture is the Hitachi shared buffer switch (N. Endo, T. Kozaki, T. Ohuchi, H. Kuwahara, S. Gohara, “Shared buffer memory switch for an ATM exchange”, IEEE Transactions on Communications, Vol. 41 Issue 1, January 1993, Page(s): 237-245).
In shared medium packet switches, all packets arriving on the input lines are synchronously multiplexed onto a common high-speed medium, typically a parallel bus of bandwidth equal to N times the rate of a single input line. An example of such a switch is the ATOM (ATM output buffer modular) switch constructed by NEC (H. Suzuki, et al., “Output-buffer switch architecture for asynchronous transfer mode”, Proc. Int. Conf. On Communications, pp 4.1.1-4.1.5, June 1989). Shared memory switches and shared medium switches have also been shown to improve contention avoidance.
Shared memory packet switches make use of a single dual-ported memory shared by all input and output lines. In this type of architecture two main construct constraints must be considered. First the processing time required to determine where to enqueue the packets and to issue the required control signals should be sufficiently small to keep up with the flow of incoming packets. In addition, the memory size, access time, and bandwidth are important construction elements. If N is the number of ports, and V is the port speed, then the memory bandwidth should be at least 2NV.
Others have used a time slotted switch fabric and built various configurations of input buffering, output buffering and shared buffering around it to compare the buffering techniques (M. Hluchyj and M. Karol, “Queuing in high performance packet switching” IEEE J. Selected Areas in Communications, Vol. 6, no. 9, December 1988, Page(s): 1587-1597). Simulations were performed to prove that the packet loss characteristics improve dramatically with complete sharing of the memory, thus requiring a much smaller memory size than the first configuration.
In memory management, fine-tuned switch architectures can attain high throughput, but these require either complex scheduling algorithms, which are difficult to realize in hardware or memories working at speeds greater than the link rate. Others propose a parallel packet switch (PPS) to overcome the need for memories working at line rate (S. Iyer, A. Awadallah, N. McKeown, “Analysis of a packet switch with memories running slower than the line-rate” INFOCOM 2000, Proceedings of the Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. IEEE, Vol. 2, 2000, Page(s): 529-537).
Most of these high speed switch architectures require large amounts of buffer space or high-speed memory access. If this buffer space is external to the switch IC, in the form of SDRAMs, gigabits of data can be stored in a single external chip. However the performance will be limited by the access time to the memory, and highly efficient memory access control is required as well. For these reasons, when the memory is external to the switch chip, there is a lower bound on the latency through the switch. Internal memory is faster than external memory but expensive in terms of chip real estate and limited in size. One of the logic designer's biggest challenge is effectively utilizing the memory present within an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). In addition to being a limited resource, memory blocks internal to the chip are not capable of running at speeds required by the shared medium switches and other fast packet switch architectures, which require a minimum speed of twice the link rate.
The following describes a RapidIO Interconnect Protocol. While the basic switch construction is applicable to a wide variety of protocols (PCI Express Switched, Infiniband, etc.), it was modeled and implemented using the RapidIO standard protocol. The RapidIO standard defines a high speed, low-latency interconnect technology. It has a three-layer hierarchy: the physical layer, the transport layer, and the logical layer. RapidIO supports data rates up to 10 Gb/s. It has well defined parallel ( 8/16) and serial (1x/4x) versions for the physical layer. It has a flat address space (supports up to 65,000 nodes) enabling simple programming models. Error management and flow control are performed using embedded control symbols. The control symbols used for flow control and error correction can be embedded at any point within a packet, contributing to lower latency.
There exists a need in the art for a new low-latency switch architecture that substantially lowers the latency and delays the onset of saturation in packet-switched networks. This switch architecture would allow all input ports to write simultaneously to a single output port provided there is buffer space available in the history buffer. Further, an ideal low-latency switch architecture would improve packet scheduling and routing, thus ensuring a lower latency across the network. In addition, the switch construction would permit compliance with a wide variety of protocols.