Packet switching involves the transmission of data in packets through a data network. Fixed-sized packets are referred to as cells. Each block of end-user data that is to be transmitted is divided into cells. A unique identifier, a sequence number and a destination address are attached to each cell. The cells are independent and may traverse the data network by different routes. The cells may incur different levels of propagation delay, or latency, caused by physical paths of different lengths. The cells may be held for varying amounts of delay time in buffers in intermediate switches in the network. The cells also may be switched through different numbers of packet switches as the cells traverse the network, and the switches may have unequal processing delays caused by error detection and correction.
A general model of a N×N switch includes N input ports, N output ports, a time (or space) division interconnecting network (or switching fabric) and a scheduler. Operations of the switch are synchronized over fixed-size time slots. Packets arrive at the switch by input links and depart the switch via output links. An arriving packet could be variable or fixed length, unicast or multicast. A packet is multicast if it has more than one destination output port. Otherwise, it is a unicast packet.
As a variable length or multicast packets can be transferred into fixed-length and unicast packets by methods well-known in the art, without loss of generality, it is assumed herein that fixed-length and unicast packets only are being discussed. Conforming to the literature, the term “cell” is used hereafter to refer a fixed-length packet. Each cell consists of two fields: the header and the payload. The destination output port number of a cell is encapsulated in the header.
Without loss of generality, each input-output link is assumed to transmit data at a speed of one cell per time slot. However, it is not necessary for a link (connecting to an input-output port) of the interconnecting network to operate at the same speed as an input-output link. If each link of the interconnecting network operates at a speed of S times the speed of each input-output link, it is said that the switch has an internal speed-up of S. It is noted that S may be equal to 1. In a switch with an internal speed-up S, each input or output can transmit to or receive from the interconnecting network up to S cell(s) in each time slot, respectively. During each time slot, the interconnecting network is capable of being configured by the scheduler to simultaneously set up a set of transmission paths between any pair of input and output, provided that no more than S cells are transmitted by an input or received by an output.
Due to the unscheduled characteristics of cells arriving at different input ports, cells destined for the same output port may simultaneously arrive at the switch from many input ports. Consequently, to suppress cell losses, it is necessary to provide buffers in a switch to accommodate incoming cells before they can be relayed to the next hop. Depending on where the buffering of cells is provided, a queuing strategy may be based on output queuing, shared queuing, input queuing, or combined input-output queuing, as follows.
Output queuing (OQ): During each time slot, a cell arriving at any input port is immediately stored into a buffer that resides at the destination output port. In a worst-case scenario, at most N write operations and 1 read operation must be performed by a single OQ buffer during each time slot.
Shared queuing (SQ): A single buffer is shared by all of the input ports and output ports of the switch. Cells are stored into and read from the SQ buffer upon their arrivals and departures, respectively. In a worst-case scenario, at most N write operations plus N read operations can occur at the buffer, imposing a more stringent bandwidth requirement than an OQ buffer.
Input queuing (IQ): To avoid using high-bandwidth buffers as in OQ and SQ schemes, a buffer is maintained by each input port for incoming packets. With a properly designed scheduling algorithm, a set of input-output contention free cells is selected from the buffered ones for transmissions to their destination output ports, from time slot to time slot. With this queuing scheme, the bandwidth demand of each input buffer can be reduced to the least of one write operation and one read operation per time slot.
Combined input-output queuing (CIOQ): In a CIOQ scheme, buffers are provided at both input and output ports. Compared to a pure IQ switch, the output buffers of CIOQ introduce more freedoms to the designing of scheduling algorithms and enable the choice of an intermediate internal speed-up between the two extremes of 1 and N, for IQ and OQ switches, respectively. Therefore, a CIOQ switch can achieve a good compromise between the good performance of OQ switches and the good scalability of IQ switches. As a result, the CIOQ scheme has been widely accepted as the most promising candidate for building scalable switches.
Among the above-mentioned switch architectures, OQ and SQ switches achieve the best performance. However, OQ and SQ switches have the worst scalability, since the bandwidth of an OQ or an SQ buffer grows linearly as the aggregated input-output link rate increases. The best scalability is achieved by an IQ or a CIOQ switch with each input buffer maintaining a single FIFO for all incoming cells. Despite its architectural simplicity, however, the maximum throughput of an IQ switch with FIFO queuing is only 58.2% for uncorrelated (Bernoulli) traffic with destination outputs distributed uniformly and is even worse for correlated (on/off bursty) traffic. The poor performance is caused by well-known HOL (head of line) blocking problems, in which a cell queuing behind the HOL cell of a FIFO cannot participate in scheduling, even if both its residing input and its destination output are idle.
Numerous alternatives for organizing an input buffer of an IQ or CIOQ switch have been proposed by various parties over past years to overcome the HOL problem of the single FIFO queuing. One alternative attracting great interest is the virtual output queuing (VOQ) scheme, also known as the multiple input queuing scheme. In a VOQ scheme, cells awaiting at an input buffer are organized as N separate queues according to their destination output ports. Such a queuing scheme has been shown to be able to achieve the best performance of an IQ switch (i.e., 100% throughput can be achieved by a VOQ switch independent of the offered traffic). However, the high complexity of scheduling queued cells in a VOQ switch brings up a new problem, namely, the scheduling algorithms for achieving 100% throughput have a complexity of O(N3LogN), which is impractical under high-speed environments. As a consequence, the key to putting into practice a VOQ switch with good performance is the reduction of the scheduling complexity.
In summary, the bottleneck identified by research for scaling IQ and CIOQ switches is the scheduling complexity and the bottleneck for OQ and SQ switches is the speed-up N buffer. So far, the best existing results for an N×N switch are:
Input queuing: A throughput of 100% may be achieved by a scheduling complexity of O(N3LogN) and using speed-up 1 input buffers. The scheduling complexity constitutes the bottleneck.
Output queuing and shared queuing: A throughput of 100% may be achieved by a scheduling complexity of O(NLogN) and using speed-up N output (or shared) buffers. The speed-up N output (or shared) buffers constitute the bottleneck.
Combined input-output queuing: A throughput of 100% may be achieved by a scheduling complexity of O(N2) and using speed-up of 2 input and output buffers. The scheduling complexity constitutes the bottleneck.
The good scalability of an IQ/CIOQ switch is offset by the great efforts needed by scheduling the accesses of buffered cells to the interconnecting network.
Therefore, there exists a need in the art for improved apparatuses and methods for high-speed data switching. In particular, there is a need for high-speed switches and routers that attain high throughput and scalability while relying on simple scheduling algorithms.