The complexity of the switch fabric and some related functions in a switch often grows significantly faster than linearly, frequently as N squared, where N is the number of lines. This is especially a problem in large broadband ATM switches where the resulting growth in size and power make the required high speed performance even more difficult to achieve.
An ATM switch distinguishes itself from a circuit switch in that it must reconfigure itself essentially every cell period. Furthermore, it must deal with a cell stream from each of its input ports wherein each cell may be destined for a different output port. This leads to contention among cells for output ports, since it is entirely possible for cells from two input ports to be destined for the same output port at the same time. This implies the need for storage somewhere in the switch so that all cells can eventually reach their intended output port. In some architectures, this also means that a contention resolution device (CRD) is required to act as a traffic cop, determining which contending cells have access to an output port.
In many architectures, contention that occurs for an output port means that some portion of the switch is idle while a cell waits, implying degradation in the throughput of the switch. Because of the statistical nature of the arrivals of cells at the input ports and of the destinations, there usually exists some small but finite probability of cell loss, which must be minimized. Finally, even if there is no cell loss, periods of considerable contention lead to large numbers of cells being instructed to wait somewhere in the storage media of the switch, implying long delays through the switch for some cells some of the time, leading to variations in transport delay or cell jitter.
Input-Buffered Crosspoint Switches
A simple ATM switch can be constructed by preceding a crosspoint array with a FIFO input buffer on each of its input ports, as shown in FIG. 1. A contention resolution device (CRD) then examines all of the output port requests, comparing them against one another, and decides which FIFOs may empty a cell into the switch core, permitting only one cell to be routed to any given output port. Cells that contend and lose will get a chance to leave their FIFO during the next cell period. If none of these input buffers overflows, then there will be no cell loss. A losing contender at the head of one of these queues or "lines" forces all cells behind it to wait, even if they are destined for an output port that is free. This is called "Head of Line" (HOL) blocking.
This type of architecture is called an "Input Buffered Switch". The switch described above is a single-stage switch. In a multi-stage switch, it is possible for there to be contention for the intermediate output ports of each stage, leading to the possibility of blocking or the need for storage and perhaps contention resolution at each stage.
Although the input buffered switch employing a crosspoint switching element is conceptually straightforward, this architecture has the following disadvantages:
1. The complexity of the crosspoint grows as N squared (if a single-stage crosspoint fabric is employed). PA1 2. The contention resolution device must resolve contention over all N input ports, and its complexity eventually tends to grow as N squared as N gets large. PA1 3. The throughput of the switch is only 58% with uniformly distributed random input traffic due to contention and HOL blocking. Further degradation can occur with bursty input traffic as discussed in the paper entitled "Performance Of A Non-blocking Space-Division Packet Switch In A Time Variant Non-Uniform Traffic Environment" by M.J. Lee and S. Li in the 1990 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, Atlanta, April 1990. PA1 4. It is difficult to maintain cell order from a given tributary to the same output port when efficient input concentration schemes are employed that can apply input cells to more than a single dedicated input queue. This is because the cell delay for cells entering different queues can be different because the various queues may experience different amounts of cell contention and HOL blocking.
Throughput Degradation
For the purposes of this specification, the capacity of a switch is the number of ports times the bandwidth of each port. The throughput of the switch is the sustained maximum amount of information that can flow through the switch while maintaining an acceptable level of cell loss probability, such as &lt;10.sup.-9. Throughput is less than capacity due to throughput degradation caused by contention. Peak throughput equals capacity.
The Terabit Switch loosely derives its name from the scenario of serving 64,000 sonet STS-3 155 Mb/s lines each with a duty cycle of 10%. This amounts to a throughput of 1.0 Tb/s. Such a Terabit Switch would have to have a capacity at least large enough such that under uniformly distributed random traffic it would be left with 1.0 Tb/s of throughput. Furthermore, it is strongly desirable that it be able to maintain 1.0 Tb/s of throughput under the further degraded conditions of some moderately bursty traffic.
In the simple architecture above, throughput degradation occurs because there is only one path to each output port. Unless the input traffic destinations are perfectly organized, the output ports cannot be utilized 100 percent of the time. As mentioned above, uniformly distributed random input traffic has a frequency of occurrence of contention for output ports such that throughput is only 58 percent of capacity. In principle, this can be dealt with by overbuilding the switch by a factor of about two.
However, the problem is much more serious for traffic that is correlated and bursty. For example, while the average information rate directed to a given output port may be within its bandwidth, it may happen that several sources direct long bursts of data to that port during approximately the same time interval. In this case, contention can extend over long periods of time. More serious throughput degradation can result because many input queues can suffer from extended HOL blocking, forcing many otherwise-routable cells to sit and wait. This in turn leads to the need for much larger buffers to hold the cells until they can be routed. Even if the buffers are large enough to avoid cell loss, the implied peak cell delay translates to larger amounts of cell transport delay variation, or cell jitter.
An input-buffered switch with 16,384 ports (after concentration) operating at 155 Mb/sec would have a capacity of 2.54 Tb/s and a throughput under uniformly distributed random traffic of 1.47 Tb/s. This is referred to as a 2:1 overbuilt switch. It can withstand an additional 32 percent throughput degradation due to bursty traffic and still maintain a throughput of 1.0 Tb/sec.
Other Architectures
Because of the disadvantages cited with reference to the switch of FIG. 1, many researchers have paid less attention to these architectures in favor of other architectures, such as some multi-stage architectures that grow as N log N, like the Batcher-Banyan switches.
Another architecture is the output-buffered switch illustrated in FIG. 2. In this switch, each input port has its own bus that has access to the buffer of every output port. The buffer is constructed so that it can store all inputs applied to it simultaneously. One way to construct such a buffer is to precede a FIFO with a time division multiplexer (TDM) and operate the FIFO at m times the rate of the individual input lines, where m is the number of input lines. If an input cell has the address of a given output port, it simply enters that port's buffer. This idealized version of the output buffered switch is obviously impractical for large switches, as its complexity and the demands on the output buffer grow rapidly. Practical output buffered switches pare down this idealized switch in a way that results in an acceptable combination of size and cell loss performance.
An important characteristic of the output buffered switch is that it requires no input queuing and no contention resolution. It can thus "combine" the cell streams from a multiplicity of sources in a simple fashion if the number of sources is not too large. This observation is a key to the architecture proposed here.