1. Introduction
Recently, substantial research effort has been devoted to the development of ATM (asynchronous transfer mode) or fast packet switches due to its capability of sup porting diverse traffic requirements with high performance. In general, two classification schemes have been identified to characterize the numerous ATM switching architectures. One classification scheme is based on buffering strategies used for temporary storing the cells to be switched and the other classification is based on the architectural design of the switches themselves.
According to the classification scheme based on buffering strategies, four types of queuing disciplines have been identified, which differ by the physical location of the buffers: the input queuing switches with buffers at the input ports of the switches (FIG. 1), the output queuing switches with buffers at the output ports of the switches (FIG. 2), the shared queuing switches with buffers at the center of the switches (FIG. 3), and various combinations of the above (not shown).
Briefly, the input queuing approach, illustrated in FIG. 1, was developed to solve possible contention problems at the input. A contention occurs when two or more cells are destined for the same port during one cycle. For example, if two cells are destined for the same output port of the switch, there is an input or output contention, respectively. To solve such contention problems, each inlet 0-3 of the switch 10 disposes of a dedicated buffer 15 which stores the incoming cells so that multiple cells will not be transferred through the same outlets of the switching elements .beta.. Note that each element includes two inlets and two outlets. However, the disadvantages of input queuing switches include having a maximum throughput of approximately 58% due to head-of-the-line (HOL) blocking.
The output queuing approach, illustrated in FIG. 2, was developed to solve possible contention problems at the output. Each outlet 0-3 of the switch 20 disposes of a dedicated buffer 25 which allows it to store multiple cells which may arrive during one cell cycle. Although output queuing switches can achieving 100% throughput efficiency, the disadvantages include the necessity that the cell transfer must be performed at N times the speed of the inlets. In other words, in a 64.times.64 network, the cell transfer must be performed at a bandwidth of 64 times the inlet bandwidth.
With respect to the central queuing approach, shown in FIG. 3, the single queue 30 is not dedicated to a single inlet or outlet 0-3 but is shared between all inlets and outlets. In this scheme, each incoming cell will be stored in the central buffer, while every outlet will select the cells which are destined therefor. However, while central queuing switches can achieving 100% throughput efficiency, this discipline requires a complex memory management system to manage the stored cells.
Shared queuing disciplines (not shown) combine any of the input, output and central queuing techniques, and accordingly share the same respective disadvantages, discussed above, with the addition of a larger size.
Switching architectures based on their architectural design can be classified into three categories: shared-memory switches, shared-medium switches, and space-division switches. The primary implementation constraint of building a shared-memory switch comes from the necessity of having a very high memory bandwidth. The memory control logic has to be able to process N incoming cells and route N outgoing cells to their destinations. Thus, a bandwidth of 2N cells into and out of the memory, within a timeslot or cycle, has to be maintained. The so-called "bit-sliced" technique has been used to help achieve the required memory bandwidth.
With respect to shared-medium switches, the primary implementation constraint relates to the bandwidth of output filters. The filters, in the worst case, have to process N cells, within a timeslot, and thus have to sustain a flow of N cells per timeslot. Similarly, the bit-sliced technique is a common solution to help achieve the required bandwidth. Regarding space-division switches, although there are multiple paths from the inlet to the outlet ports, there are output contention and internal switch conflicts that occur. To solve the contention problems, the so-called "speed-up" technique is often employed. The speed-up technique basically runs each switching element in the switching fabric at an increased speed proportionate to the number of inlets, so that each cell can transfer in successive switch cycles. In other words, if there are 64 inlets, the switches are sped up by a factor of 64 so that each inlet can operate cyclically during one cycle T.
Accordingly, the internal switching fabric in all of the above switching architectures must employ either bit-slicing or speed-up to achieve the required bandwidth and to resolve output contention. However, due to the small size of ATM cells that typically transfer through the switching network, the bit-sliced technique will prove inadequate. Further, with respect to conventional applications of the speed-up technique, since the all of the input cells must be buffered, the speed-up is limited to the speed of the buffer. For example, SRAMs are typically used for such a buffer. As is known, most SRAMs operate at a maximum speed of 10 ns, thereby limiting the possible speed of the network.
2. Multistage Interconnection Networks (MINs)
A MIN is typically formed of a large number of inlets and outlets (e.g. up to the tens of thousands) coupled together in a switching fabric comprising numerous identical switching building blocks or elements. An example of a 8.times.8 MIN is shown in FIG. 4. Specifically, the MIN of FIG. 4 comprises 3 stages 0-2 of multiple switching elements .beta., where each stage includes 4 such elements .beta.. If two cells appear at the outlet of any element during one cell cycle, the element is "in the conflict state". A conflict element occurring in stage 0 or 1 would be known as an internal blocking or input contention, while a conflict element occurring in stage 2 would be known as an output contention.
a. Batcher-Banyan based MINs
A Banyan network is the most common type of MIN. Types of Banyan networks include baseline, generalized cube, shuffle-exchange, indirect binary n-cube, and omega networks. As will be described later, the major property of a Banyan network is that the switching fabric is self-routing, i.e. there exists exactly one path from any input to any output. Further, cells that appear at the input of the Banyan network can route through the network based on the binary representation of the output. As with the switching fabric of a typical MIN, the basic building block of the Banyan network is a 2.times.2 .beta. switching element. These switching elements are used to form an N.times.N Banyan network 50 which is typically built using two N/2.times.N/2 Banyan subnetworks, as shown in FIG. 5. FIG. 6 illustrates a 16.times.16 Banyan network 52, i.e. a Banyan network having 16 inputs and 16 outputs and comprising two 8.times.8 subnetworks.
Banyan networks have been categorized as self-routing, as well as, blocking networks. In other words, each .beta. element can determine its switching state from the binary destination address of its own input. Nevertheless, some of the connections may be blocked as multiple concurrent paths are being established from the inputs to the outputs. However, it is known that if output addresses of the active inputs are arranged in a monotonically ascending or a descending order, the Banyan networks become non-blocking. Thus, it is desirable to have the cells sorted according to their addresses. For example, a Batcher network or running adder network is typically added before the Banyan network (called a Batcher-Banyan Network) to create a non-blocking environment for the transport of ATM and fast packet switches. In particular, the Batcher sorter places the cells with the lower destination addresses at the upper outlet of the sorter. However, although the Batcher-Banyan network solves internal switch contention problems, output contention concerns remain.
b. Starlite Switch
The Starlite switch, shown in FIG. 7, was the first fast packet switch that adopted the Batcher-Banyan architecture. Besides having the standard Batcher-Banyan network as its routing network (i.e., the Batcher sorter 60 and Banyan Network 50), the Starlite switch includes trap and concentrator networks, 70 and 80, respectively. Further, input port controllers IPC 55 transfer the incoming cells to the switch.
In short, to overcome the output contention problem of the Batcher-Banyan network, the Starlite switch adds trap network 70 therebetween. The trap network detects cells which are simultaneously arriving at the output of the Batcher network with the same output destination. These conflicting cells are fed through concentrator 80 and back to the entrance of the Batcher sorter 60, via shared recirculating queue 90, to try again in the next cycle.
c. St. Louis switching fabric
Turner proposed a buffered Banyan switch that was not specifically designed for fixed length ATM cells but for switching variable length packets. As shown in FIG. 8, the switch includes a copy network (CN) 92, a plural of broadcast and group translators (BGTs) 94, a distribution network (DN) 96, and a self-routing network (RN) 98 (such as a Banyan network). The CN 92 generates copies of packets to its output according to a pre-defined request. The BGTs 94 perform the header translation to determine the proper destination addresses of the cells. The DN 96 randomizes the incoming packets over all its outlets. This is done so that on its outlets (the inlets of RN 98), the traffic is uniformly distributed over all the links to prevent internal contention (and thus internal cell loss) within the RN. Finally, packets are routed through the RN 98 to their destinations using the header information. If two packets conflict for an output of a switching element, one packet has to be buffered internally.
d. Omega Multinet switch
As shown in FIG. 9, the Omega N.times.N Multinet switch consists of log.sub.2 N stages of concentrators with FIFO buffers. Each stage is labeled from 0 to log.sub.2 N-1 and the ith stage of the N.times.N switch is composed of N 1.times.2 demultiplexers and 2.sup.i+1 concentrators of size 2.sup.n-i+1 .times.2.sup.n-i+i.
Arriving cells are divided into two groups at each stage according to their first bit of the destination addresses. Each concentrator consists of a reverse Banyan network and a FIFO buffer. The fetch-and-add technique is to use to create a non-blocking environment for the reverse Banyan network.
However, the Batcher-Banyan, St. Louis and Omega switch architectures all require a large amount of switching elements and switching stages to ensure that a non-blocking switch, free of input, internal and output contention, is achieved.
3. Objectives
It is therefore an object of the present invention to provide a switching architecture having a non-blocking network that is free of input, internal and output cell contention.
Another object of the present invention is to provide a switching architecture that uses the speed-up technique which divides the input ports into a plurality of modulo groups.
A further object of the present invention is to provide a switching architecture that utilizes fewer switching stages, and therefore, fewer switching elements to save chip space.
An additional object of the present invention is to provide a switching architecture that switches cells in accordance with the Universal Packet TimeSlot (UPTS) technique.
Yet another object of the present invention is to provide a switching architecture for moderate large dimension (MLD) and very large dimension (VLD) switches.