The use of a shared memory switch core equipped with port Output Queues (OQ's) whose fillings are monitored so that incoming packets can be held in ingress VOQ's to prevent output congestion is known in the prior art. FIG. 1 and related description of the first cross referenced patent application set forth above give a detailed description of the prior art shared memory switch system and is fully incorporated herein by reference.
Algorithms to select which ones of the ingress queues should be served at each packet cycle, so as to maximize the use of the available switching resources, are known from the art. However, they have been devised to operate with a crossbar type of switch i.e., with a memoryless matrix of switches that can establish solid connections between a set of inputs and outputs of a switch core, for a time long enough to allow the transfer of a packet from all IA's that have something to forward and have been selected. Algorithms tend to optimize the use of the matrix thus, solving the contention between inputs contending for a same output. Typically, the purpose of this type of algorithms is to reassess a new match at each packet cycle. The most known of those algorithms is referred to as iSLIP. A description of it can be found in “The iSLIP Schedulinga Algorithm for Input-Queued Switches” by Nick McKeown, IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 7, NO. 2, April 1999. Hence, iSLIP and its many variants that have been studied and sometimes implemented in commercial products, are essentially done for crossbar switches and do not fit with the type of switch core considered by the invention where switching is achieved through the use of a shared-memory (112) which is known to be much more flexible than a simple crossbar.
Indeed, with a shared-memory core, a packet may be admitted even though the output through which it must exit the fabric is not currently available. In this architecture each IA is implicitly authorized to forward the received packets (105, FIG. 1 of referenced application 1. cited above) to the switch core as soon as they arrive. Obviously, the central shared-memory is not an infinite resource and backpressure may have to be applied to all IA's in order to prevent the admission of further packets if central resource gets exhausted because one or more outputs are congested. This is generally done on a per priority basis. Backpressure mechanism stops lower priorities first. On the contrary of the crossbar, this mode of operation does not require any form of scheduling of the packets forwarded by IA's and there is no central scheduler needed.
This scheme works well as long as the time to feed the information back to the source of traffic i.e., the VOQ's of IA's (100, referenced application 1. cited above), is short when expressed in packet-times. However, packet-time reduces dramatically in the most recent implementations of switch fabrics where the demand for performance is such that aggregate throughput must be expressed in tera (1012) bits per second. As an example, packet-time can be as low as 8 nanoseconds (109 sec.) for 64-byte packets received on OC-768 or 40 Gbps (109 bps) switch port having a 1.6 speedup factor thus, actually operating at 64 Gbps. As a consequence, round trip time (RTT) of the flow control information is far to be negligible as this used to be the case with lower speed ports. As an example of a worst case traffic scenario, all input ports of a 64-port switch may have to forward packets to the same output port eventually creating a hot spot. It will take RTT time to detect and block the incoming traffic in all VOQ's involved. If RTT is e.g.: 16 packet-times then, 64×16=1024 packets may have to accumulate for the same output in the switch core. A RTT of 16 packet-times corresponds to the case where, for practical considerations and mainly because of packaging constraints, distribution of power, reliability and maintainability of a large system, port adapters cannot be located in the same shelf and have to interface with the switch core ports through cables. Then, if cables (150) are 10 meter long, because light is traveling at 5 nanoseconds per meter, it takes 100 nanoseconds or about 12 packet-times (8 Ns) to go twice through the cables. Then, adding the internal processing time of the electronic boards this may easily add up to the 16 packet-times used in the above example.