1. Field of the Invention
This invention relates generally to data switches and, more specifically, to a high speed multi-stage switching network formed from stacked switching layers for use in routers and the like.
2. Description of the Related Art
This invention relates to switches. A switch, in the most general sense relevant here, is a communications device that controls the routing of a signal path. Switches are generally categorized as packet switches or as circuit switches. Packet switches, also sometimes called datagram switches, switch packets containing both data and meta-data (control information). Some well-known packet switching devices are IP routers and asynchronous transfer mode (ATM) switches. As stated in RFC 1812, “An IP router can be distinguished from other sorts of packet switching devices in that a router examines the IP protocol header as part of the switching process. It generally removes the Link Layer header a message was received with, modifies the IP header, and replaces the Link Layer header for retransmission.”
Circuit switches are devices that establish a dedicated channel for the duration of the transmission, thereby allowing data that is not accompanied by meta-data to be transmitted in real time. The public switched telephone network (PSTN) is a circuit-switched network. A telephone switch that is part of the PSTN is a prototypical circuit switch. This patent application will focus on packet switching devices, but a switch made in accordance with this invention is applicable to a circuit switching device as well.
In a data communications network that uses packet switching technology, data to be sent from one network interface to another is broken up into small chunks for transmission over the network. The individual data chunks are typically combined with suitable control information to form transmission units called “packets.” The packets are usually self-contained in the sense that the packet itself carries the information needed for routing the packet to its intended destination. The destination information is part of the packet's control information.
Each packet generally has a header containing its source and destination, a block of data content sometimes called a payload, and an error-checking code. All the data packets related to a message may or may not take the same route to get to their destination; they may pass through different packet switches on the way to their final destination and they are all reassembled once they have arrived.
Some packet-switching protocols refer to the transmission units as “datagrams” or “frames” or “messages” or “cells”. This application, however, will generically refer to all such transmission units as packets without regard to the actual format or specific name used by any particular protocol.
In the context of a packet switch, therefore, a switch is a networking device which can send packets directly to a port associated with a given network address, or destination address, contained in the packet.
FIG. 1 is a simplified block diagram of a switch 10 that forwards data arriving at one of its inputs 11 to one of its outputs 13. The core of a data switch, as shown by FIG. 1, is a so-called “switch fabric” 12 that routes data from an input port to an output port.
A “router” is a device that finds the best path for a data packet to be sent from one network to another. A router stores and forwards electronic messages between networks. A router generally picks the most expedient route to the destination address from among all possible paths based on the traffic load and the number of hops.
A router commonly incorporates a data switch and combines such switch with other complexities such as input buffers, output buffers, port mappers, schedulers for generating “switch commands”, sorters and so on. FIG. 2 shows a simplified block diagram of a router 20 consisting of (1) a plurality of “line cards” 21 that each have one ore more network interfaces 22 to the attached networks, (2) an internal interconnection unit or data switch 10 that contains a “switch fabric” 12 as discussed above, and (3) a processing module 23.
The most common switch fabric technologies in use today are buses, shared memories, and crossbars.
Buses and Shared Memories
The simplest switch fabric 12 is a shared bus that operates in a time-division manner. In such case, multiple interface cards are connected to the bus and a microprocessor executes suitable software for performing the routing function. The microprocessor reads data from an input port connected to the bus, determines a “next hop” address by reading the packet's destination address and performing a look up operation in a routing table that is updated pursuant to suitable protocols, and then writes the data to the appropriate output port based on the next hop determination. The data is usually buffered in a common memory connected to the bus such that it must cross the bus twice in going from an input port to an output port.
While this simple bus-based, software controlled architecture is useful for a router with 10 megabits per second (Mbps) ports, and perhaps for a router with relatively few 100 Mbps ports, its capacity is limited in terms of data rate and port count. It is difficult to achieve wire-speed routing at higher data rates with this architecture because of bottlenecks associated with the shared bus, the memory's data transfer bandwidth, and the processor's clock speed. According to one author, “it is almost impossible to build a bus arbitration scheme fast enough to provide nonblocking performance at multi-gigabit speeds.” Aweya, James, IP Router Architectures: An Overview, Nortel Networks, p. 30.
There are, of course, more efficient ways of operating with a bus-based switch fabric 12. For example, some designers have put “satellite” processors, route caches, and memory on the interface cards themselves to allow the cards to process packets locally and make their own routing decisions whenever possible.
Other bus-based architectures used multiple parallel “forwarding engines” that operate only on the destination header, the packet data being forwarded directly from an input interface card to an output interface card under the control of so-called forwarding engines. The packet's data payloads, in other words, is directly transferred from interface card to interface card.
Crossbars
A more advanced generation of routers was designed with a parallel connection switch fabric that operated in a space-division manner rather than a time-division manner. Such switch fabrics allowed data throughput to be increased by several orders of magnitude. A popular switch fabric of parallel connection construction is known as a crossbar switch.
FIG. 3 is a simplified block diagram of an N×N crossbar switch 112 implemented in crosspoint arrangement with switching elements located at each node or crosspoint 113. Data arriving on at inputs row is placed on an output column if the corresponding crosspoint 113 is active. FIG. 4 is a simplified block diagram of an N.times.N crossbar switch 212 that uses multiple N-to-1 demultiplexers 213, one for each of the N outputs. A full crossbar switch is desirable because every input port has a path to every output port such that there is no blocking at any input ports or inside of the switch. Blocking will only occur when two packets compete for the same output port.
Crossbar switches are conceptually desirable, but they have generally been regarded as physically impractical for large switches.
Crossbars usually have very low blocking probabilities, but they have a key defect: they require a lot of circuitry (proportional to n.sup.2 or worse) in each output port. Because costs grow quadratically with the number of ports, crossbar designs are generally suitable only for comparatively small switches.
Partridge, Craig. Gigabit Networking, Massachusetts: Addison-Wesley Publishing Company, 1994. Page 100.
In other words, prior art approaches to crossbar switches do not scale well such that they are generally regarded as useful only for small switches:
A cross bar is internally nonblocking (i.e., no sample is blocked in the switch waiting for an output line).
Unfortunately, an N.times.N crossbar uses N.sup.2 elements and therefore is expensive for large N, such as N=100,000 *** However, crossbars are an excellent solution for building smaller (say, 8.times.8 or 64.times.64 switches).
Keshav, S., An Engineering Approach to Computer Networking: ATM Networks, the Internet and the Telephone Network, Massachusetts: Addison-Wesley Publishing Company, 1997. Page 168.
Prior art switches of larger dimension have generally been implemented as multistage switches comprising at least two stages and an interconnection from stage to stage according to a desired interconnection topology. A multistage switch, in other words, divides the inputs into groups that are internally switched by columns of switching elements consisting of smaller, full crossbar switches and ultimately outputs from a second column of smaller, full crossbar switches that serve as switching elements.
FIG. 5, for example, is a simple 16-port Banyan switching network 312 formed from two-stages or “columns” of 4.times.4 crossbar switching elements 313. As is well known, if the total number of ports is P (16), and the crossbar switching elements 313 are N.times.N (4.times.4), the switch fabric requires logN(P)*P/N (log4(16)*16/4=2*4=8) crossbar switching elements 313, organized as logN(P) (log4(16)=2) column of P/N (16/4=4) elements each. FIG. 5 further illustrates how a switch scheduler (not shown) may control the switching network by attaching a “switch command” or switch address header 314 on each arriving message 315. On each cycle of the switching network, as each stage of the switching network 312 is traversed by the messages, the switch address header 314 of each message locally controls each input port. The router, in other words, includes suitable means for responding to the switch command and routing the data packet through the multi-stage switching network to a second line card corresponding to the desired route. In the first stage, for example, the first two bits (“11”) of the switch address header 314 instruct the switching element 313 to output the message 315 on port 3. The address bits for the first stage are deleted from the front of the switch address header 314. At the next stage, therefore, the first two bits (“10”) instruct the switching element 313 in the second column to output the message on port 2. The final stage deletes the final two bits of the switch address header 314, leaving only the message 315.
The simple switching network 312 of FIG. 5 can have interior blocking, i.e. two messages addressed to different outputs can require the same interior connection. For example, in FIG. 5, if two messages addressed to outputs 1 and 2 were presented to different inputs on the upper-left switching element 313, they would both require the single connection between the upper-element 313 and the upper-right element 313. Assuming that the interior paths are the same speed as the external input and output ports, this situation would require one of the two messages to be deferred in a suitable buffer or dropped, even though there is no contention for the same output port.
FIG. 6 shows a three-stage switching network 412 that reduces the internal blocking problem associated with the two-stage switching network 312 of FIG. 5. The three-stage switching network 412 includes a third column 323 of switching elements 313 that, in combination with the first and second columns 321, 322, provides several additional paths to reach the same switching element 313 associated with different outputs. As shown in FIG. 6, for example, two messages addressed to outputs 1 and 2 that would be blocked in FIG. 5 can reach the upper-right switching element 313 in column 321 through different intermediate elements 313, 313 in the intermediate column 322. The scheduler (not shown), of course, must compute and then add an additional pair of bits 314 times to the switch address header 314 in order to suitably traverse the switching elements 313 in the extra column 323 and “route around” the blocking.
The particular two- and three-stage switching networks 312, 412 of FIGS. 5 and 6 have “full-mesh” interconnection patterns 331 between the columns. Other interconnection patterns are possible with two- and three-column switch networks. Moreover, switch fabrics with even more columns are possible, but the return on investment for each additional column is marginal.
The methodologies of design and operation of a scheduler that is suitable for implementing multi-stage switching network are well known and will not be discussed herein for the sake of brevity.
Multistage switching networks make it more practical to construct larger switches with smaller, readily available off-the-shelf parts. It is also possible and usually desirable, as shown by FIGS. 5 and 6, to use less switching elements than are required to implement a 100% non-blocking network. The number of 4.times.4 switching elements needed to implement a nonblocking 16-port switching network is sixteen elements, arranged in four stages of four. The groupings of signals through fewer stages providing less than the nonblocking number of switching elements introduces some small probability of internal blocking, but it is relatively small. The exact probability of blocking will vary as a function of traffic. A two-stage embodiment like that shown in FIG. 5 has a 25% probability of blocking with a random traffic pattern. The three-stage embodiment like that shown in FIG. 6 has only a 0.02% probability of blocking with the same pattern. The inventors believe that designers have come to regard crossbar switches as impractical for creating large switches because the implementing electronics would occupy a large area and have long interconnects. Even sub-100% multistage switching networks like those exemplified by FIGS. 5 and 6 have typically been built in such larger sizes from smaller building block switching elements packaged as discrete chips and those building block chips have heretofore been arranged as discrete components on a relatively large printed circuit board assembly (PCBA) and generally in a planar, two-dimensional manner. The problem is that the long interconnects exhibit parasitic losses that tend to make the switch relatively slow and inefficient at the same time that the switch must consume more power to overcome interconnect related losses.
Superconducting switching elements have been used to make switches because they offer relatively fast switching speeds and extremely low power consumption (e.g. those using Josephson junctions), as compared with switching elements of conventional electronic construction. It has not been practical until now, however, to use superconducting elements to make large switches with a large number of ports. Switching elements manufactured with conventional electronics are better operated in a distributed, large area environment when it comes to cooling. It is very impractical, however, to cool such a large area to superconducting temperatures of 120K (−243.67 degrees Fahrenheit) required for so-called “high-temperature superconductors” or, for that matter, to even lower temperatures such as 4K (−452.47 degrees Fahrenheit) required for other superconducting technologies.
A large multi-stage switch constructed from a planar arrangement of switching elements on a PCBA, therefore, is impractical because the assembly is physically large, operationally slow and, were it desired to do so, difficult to cool to superconducting temperatures. A large switch of conventional construction consumes excessive space because the physical size of the PCBA grows quadratically with the number of inputs and outputs. A large switch formed from building blocks of conventional construction would operate at less than optimal clock speeds because of increased signal latency due to parasitic loads present over long lines. A large switch of conventional construction would be difficult to implement with superconducting electronics with individual building blocks distributed over a relatively large PCBA because the relatively large size of the PCBA is not amenable to being cooled to superconducting temperatures, and because the PCBA layers and dissimilar materials are in contact with one another.
In summary, as a conventional crossbar switch grows with electronics of conventional construction, it becomes slower and burns more power due the parasitic losses associated with the growing length of interconnects. At the same time, the growing switch area becomes increasingly difficult to cool to the superconducting temperatures needed to implement the switch with high speed, low power electronics of superconducting construction.
There is a need, therefore, for a data switch that offer many ports (hundreds or thousands) while being compactly constructed with short interconnects and, preferably, for a data switch that operates at very high data rates (e.g. 15 Gb/s per port) and at very low power by being implemented with superconducting electronics and cooled to superconducting temperatures, and there is a need for a router that incorporates such a switch.