Packet switching involves the transmission of data in packets through a data network. Fixed sized packets are referred to as cells. Each block of end-user data that is to be transmitted is divided into cells. A unique identifier, a sequence number and a destination address are attached to each cell. The cells are independent and may traverse the data network by different routes. The cells may incur different levels of propagation delay, or latency, caused by physical paths of different lengths. The cells may be held for varying amounts of delay time in buffers in intermediate switches in the network. The cells also may be switched through different numbers of packet switches as the cells traverse the network, and the switches may have unequal processing delays caused by error detection and correction.
If switch throughput demand is not high, shared queuing (SQ) switches are well known for being a cost-effective and efficient way of providing the buffering function for sustaining temporary egress (output) port congestion caused by simultaneously arriving traffic addressing a common egress port. Without loss of generality, FIG. 2 illustrates conventional N×N shared queuing (SQ) switch 200, which implements a typical architecture according to an exemplary embodiment of the prior art. Shared queuing switch 200 comprises N input ports, N output ports, frame deserializer (FD) 205, shared buffer 210, and frame serializer (FS) 215. Timing in shared queuing switch 200 is synchronized over time slots and data packets going through shared queuing switch 200 are encapsulated as fixed size cells.
FIG. 3 illustrates conventional fixed size cell 300 for use in N×N shared queuing switch 200. Cell 300 comprises two fields: cell header 305, which carries control information, and cell payload 310, which carries end-user data. The least significant bit (LSB) of cell 300 is transmitted first and begins header 305. The most significant bit (MSB) of cell 300 is transmitted last and ends payload 310. The destination output port of cell 300 is encoded in cell header 305. Roughly speaking, the task of shared queuing switch 200 is to transfer an incoming cell to its destination output port as fast as possible.
Without loss of generality, each input/output port is assumed to have an external link rate of one cell per time slot. Moreover, shared buffer 210 is assumed to have a bus width equal to the width of cell 300, so that each cell 300 can be stored or read as a whole unit by a single buffer access. Each incoming cell 300 arrives serially at shared queuing switch 200 via the external link of an input port. Frame deserializer 205 deserializes each serially arriving cell 300 into the bus width of shared buffer 210. Once arriving cell 300 is completely deserialized, it is forwarded in parallel from frame deserializer 205 to shared buffer 210. Shared buffer 210 is capable of writing N cells and reading N cells in a single time slot. Each cell 300 read from shared buffer 210 is immediately forwarded in a whole unit to frame serializer 215, where cell 300 is transmitted serially to the corresponding destination output port.
From a theoretical point of view, the architecture of shared queuing switch 200 is ideal in the sense that it is the most cost-effective and achieves the best performances in terms of cell throughput, mean cell delay, and other important parameters. The achievable maximum throughput of shared queuing switch 200 is limited by the bandwidth of shared buffer 210. To avoid frequent cell losses, shared buffer 210 is generally required to have a large storage capacity. As a result, random access memory (RAM) chips are commonly used in the shared buffer of a shared queuing switch.
Generally, there are two ways to increase the bandwidth of shared buffer 210: 1) speeding up the access rate of buffer 210, or 2) enlarging the bus width of buffer 210 for each single access. The access times of modern RAM chips are so low that little room is left for further improvement. In other words, for a given bus width, it is difficult for even state-of-the-art semiconductor technologies to dramatically improve the bandwidth of a RAM chip. This constitutes a bottleneck for using the first method to scale up the throughput of a shared queuing switch. As a result, the second method, enlarging the bus width seems to be the best choice is for boosting the throughput of a shared queuing switch (i.e., to enlarge the cell size and at the same time, increase the bus width of the shared buffer accordingly).
For example, with respect to the shared queuing switch in FIG. 2, if the cell size is doubled, an N×N shared queuing switch with double the throughput can be constructed as shown in FIG. 4. FIG. 4 illustrates conventional N×N shared queuing switch 400 with two shared buffer banks according to one embodiment of the prior art. Shared queuing switch 400 comprises N input ports, N output ports, frame deserializer (FD) 405, shared buffer 410, and frame serializer (FS) 415. Shared buffer 410 comprises two buffer banks, namely shared bank 411 and shared bank 412, each with a bandwidth equal to shared buffer 210 in N×N shared queuing switch 200 shown in FIG. 2. However, scaling the throughput of a shared queuing switch by enlarging the cell size has two inherent drawbacks: 1) enlarging cell size causes a greater delay in encapsulating data into the larger cells; and (2) larger cell sizes coarsens the granularity of service provided to data traffic.
Without considering the delay for a cell going through, principally, a shared queuing switch with 100% throughput can be scaled up to any size. However, the mean cell delay increases when the frame size is increased, which imposes a limit on scaling up the throughput of a switch supporting delay sensitive applications.
Proposals have been made to assemble cells into frames in such a way that a frame contains only cells on the same channel, where a channel is the switching path between a pair of input and output ports. However, the result has been that the mean frame assembly delay for a N×N shared queuing switch is upper bounded by O(N2) time slots. This upper bound is not scalable, since it increases the frame assembly delay quadratically while the switch size grows.
Therefore, there is a need in the art for improved fixed-sized packet switches. In particular, there is a need for a highly scalable switch architecture in which frame assembly is performed with a practice-acceptable delay.