In recent years, the explosive demand for bandwidth over private (such as enterprise networks) and public (e.g. the Internet) communications networks has driven the development of very high-speed switching fabric devices. Indeed, such devices have allowed the practical implementation of network switching nodes capable of handling aggregate data traffic in a Gigabit (1.0E+09 bits) to Terabit (1.0E+12 bits) per second range. Even though many different approaches are theoretically possible to carry out switching at network nodes, a contemporary preferred solution is to employ, irrespective of the higher communications protocols actually in use to link end-users, fixed-size packet (also known as “cell”) switching devices. These devices, which are said to be “protocol agnostic”, have been found to be simpler and more easily tunable for performance than other solutions, especially those handling variable-length packets. Thus, N×N switches, which can be viewed as black boxes with N inputs and N outputs, have been made capable of moving short, fixed-size packets (typically comprising 64-bytes) from any incoming link to any outgoing link thereof. Many types of switching architectures have been proposed to implement the core of the switching fabric. One solution is to build the switching fabric around a very high speed switch crossbar device 100 as shown in FIG. 1, capable of establishing at a given instant, connections between any of its inputs to any of its outputs, thus, potentially allowing any data packet to be transferred from any switch input adapter 110 to any switch output adapter 120. However, such a device has no storage capability. Thus, a packet may not be admitted through the switch crossbar unless there is a provision to receive it in a destination output switch adapter 120, and provided that no two input packets contend for the same output. Such a device would require a central scheduler or arbiter 130 to decide which set of paths may be established at a given packet cycle so as to resolve conflicts in the use of the switch crossbar. Despite this difficulty, many commercial products using this approach have been made available. They need a central scheduler which, to make good decisions, has to acquire complete knowledge of what is going on in the adapters interfacing with the crossbar. As a consequence, a high speed communications bus 140 must also exist between the adapters and the scheduler. Also, an algorithm to schedule the departure of cells at each packet cycle is far from trivial and, moreover, decisions must be re-assessed at each cycle for the whole switch. Although much research work has been conducted in this area (see e.g., “iSLIP: A scheduling algorithm for input-queued switches, IEEE/ACM Transactions on Networking, vol. 7, no. 2, pp. 188-201, April 1999 by N. McKeown) and many algorithms proposed, what remains at stake is their complexity of implementation when applied to very high capacity switch fabrics.
Yet another problem with a crossbar implementation is an inability to natively support multicast traffic. In fact, when a packet must be sent to more than one destination, the central scheduler must wait until all corresponding crossbar outputs can be freed in a same cycle. This is a serious drawback which makes the central scheduler even more complex to design and may sometimes require that multicast be supported only from the input adapters themselves. In this case, the input adapters have to replicate the sending of a same packet through the switch fabric, as many times as necessitated by the scope of the multicast (and possibly to all output ports in the case of a broadcast).
A typical example of a commercially available crossbar architecture switch, and how the aforementioned problems are actually handled can be found in data sheets (especially in ‘High Performance 16×16 Serial Crosspoint Switch’, G52191 Rev. 4.2, dated Jan. 5, 2001) and application notes (especially in AN-32 G530030 Rev. 4.0, dated Jul. 5, 1999) for VSC870 and VSC880 that are commercially available building blocks by VITESSE Semiconductor Corporation (741 Calle Plano, Camarillo, Calif. 93012, the USA) and are intended to be used to build a switch fabric of the type discussed above.
Another approach to building a switch fabric is shown in FIG. 2. It is different than that shown in FIG. 1 as it assumes that all entering packets 200, coming into the switch fabric through any input port, are temporarily stored in shared memory 210 before exiting the switch fabric as exiting packets 220 over the output ports. This approach does not have the drawback of the switch fabric shown in FIG. 1 as a packet may be admitted in the shared memory (i.e. the switching medium) even though a corresponding output is not yet available. This provides a great deal of freedom in the admission of incoming packets. Accordingly, there is no longer a strong requirement for a central scheduler. In a manner different than the approach shown in FIG. 1, each input adapter may decide on its own to let a packet in, as long as it is globally permitted to do so through a granting and/or back-pressure mechanism from switch core 230, and as long as there is enough room left in the shared memory. Therefore, the shared memory has a controller 235 whose role, however, is limited to attribute and release buffers depending on the observed movement of incoming and outgoing packets. The decision to let a packet go out is made by individual output queues 240 which only need to contain a pointer to a buffer where a particular packet has been stored upon entering the switch fabric. This scheme works well for multicast also, as a single copy of a packet may be temporarily stored in the shared memory while multiple copies of pointers are made in various output queues. In this case, the corresponding buffer may not be released until the last copy of the same packet has been made. However, there is no longer the drastic requirement of multiple copies of the packet having to exit the switch fabric in a same packet cycle. Each output queue may freely schedule the departure of a packet depending on its own load.
However, the scheme shown in FIG. 2 has some practical limitations when trying to implement large and very high speed switch fabrics of the terabit class switching equipment now in demand. Because the memory is shared, it must be shared either spatially, by allowing multiple ports, and/or time shared by all input and output ports trying to access, substantially simultaneously, a common resource. A typical contemporary design point for a switch fabric is a 64×64 port switch with each port capable of sustaining full duplex 40 Gigabits/second traffic (for example, an OC-768 of the SONET hierarchy) such that the committed aggregate bandwidth is 64×40=2.5 Terabits/second in full duplex mode. In practice, however, ports must be designed with an over-speed factor so as to absorb bursts of traffic, and therefore have an actual speed at least 50% higher such that the true port speed is about 64 Gigabits/second. For 64-byte packets this assumes that every 8 Nanoseconds (hereinafter referred to as “Ns”) a packet must unconditionally enter and leave each of the 64 ports. In the case of time sharing, this would imply a memory cycle of 8/2×64=62.5 picoseconds.
Even if memory can be implemented with several sets of ports, for example two write ports and two read ports, a sub-nanosecond cycle timing requirement is still very difficult to achieve with current technology, in one example CMOS. Consequently, the concept of shared memory, while very attractive, does not scale up to the terabit class of switches.
On the other hand, if using a crossbar approach as described hereinabove, the challenge is reconfiguration of the array at a rate of once every 8 Ns. If the task of reconfiguring the switching matrix is considered alone, this is more easily achievable than the above challenge of being able to time share a common memory at a sub-nanosecond cycle, however, the problem with a crossbar is that of making a decision in a central scheduler every 8 Ns as to how the 64×64 switch crossbar should be best reconfigured. This presents another very difficult challenge due to the complexity of the algorithms to be carried out, and because of the huge exchange of information that this would assume between a central scheduler and all the adapters.
Finally, it should also be pointed out that implementing a packet switching function imposes another difficult challenge which is the overall control of all the flows of data entering and leaving a switch. Whichever method is adopted, an assumption is made that packets may be temporarily held at various stages of the switching function so as to handle priority flows supporting Quality of Service (hereinafter referred to as “QoS”) and to prevent congestion from occurring. Many schemes have been proposed to achieve such a result. Some of these assume that traffic may be held in input queues (i.e. in adapters before entering the switch fabric), in output queues (i.e. upon leaving the switch fabric), within the switch fabric itself, or in a combination such as the Combined Input/Output Queuing (also known as “CIOQ”) scheme utilized in many contemporary switch architectures. Irrespective of any particular solution, as a general statement, it may be said that it always helps to have ample storage to prevent cell discarding in case of congestion and, generally stated, this greatly eases flow control. This remark not only applies to input or output queues in switch adapters, but is also valid for a switch fabric itself. A switch fabric should be capable of holding a significant amount of packets when necessary, especially with dramatic increases in port speed, as many more cells are likely to be received before control of a particular flow entering a switch port becomes effective. Thus, using a storage-less crossbar as a switch fabric, in addition to the problems mentioned hereinabove, does not provide help with flow control either.
It is believed, therefore, that a data packet switch which provides the many advantages taught herein would obviate many of the problems and limitations described hereinabove, and would constitute a significant advancement in the art.