ATM is emerging as the universal standard for network communication and has been designated by the CCITT as the multiplexing and switching technology for the Broadband Integrated Services Digital Network (B-ISDN). ATM was designed to allow interchange of various types of information irrespective of the type of information or the type of system which issues or receives the information. The ATM technology must also accommodate various types of end-networks, at various speeds from the megabit per second to gigabit per second range. The ability to accommodate different speeds, data types, and physical mediums makes ATM multiplexing and switching a flexible technology which will accommodate future modifications of transmission mediums and data structures. Specific examples of present day LAN applications of the ATM communication architecture becoming more and more widespread in business and academic circles include workgroup ATM, involving client-server computing with high-end workstations and servers, backbone ATM, involving connection of existing hubs and routers through a network, and connective ATM, involving connection of a LAN to a WAN.
In ATM, all information is digitized and formed into small fixed-length packets, called cells, and transmitted over a network. The cells include data portions, and header portions including error codes and routing vectors. Fast switches using efficient architectures, or switching fabrics, are required to achieve practical wide-spread implementation of ATM technology.
The end goal of an ATM switch or network of switches is transmission of the cells from a source to a destination. In an ATM LAN architecture, the physical connections comprise point-to-point links between switches and/or hosts. Host ATM interfaces allow hosts to connect to the network, and local switches act as nodes of the network. In the local ATM switch, cells received from particular ports must be routed to designated destination ports. Two steps are required in the routing: queuing of received cells pending their scheduling to a destination port, and the scheduling of queued cells.
A performance degrading bottleneck is often caused during cell routing in the ATM switch. Conventional ATM switching architecture fails to take full advantage of the bandwidth provided by modem physical transmission media, such as optical fibers. The bandwidth of the memory used as the cell queue is typically the bottleneck. Most such ATM architectures cannot be scaled to provide aggregate throughput exceeding 100 Gb/s in a 32-port configuration.
Typically, the conventional ATM switch is an output-buffered or shared-memory structure. These have the disadvantage of imposing significant memory bandwidth requirements on cell buffering memory and the switch fabric. Input queued ATM switches, in contrast, impose minimal memory bandwidth requirements on the cell queue, allowing the potential bandwidth of the queue memory to be better utilized.
In any of those structures, the ATM switch includes a memory for cell queuing. The location of the cell storage memory is used to classify the switch as input-queued, output-queued, or shared memory. In a shared memory, throughput is limited by the access speed of the RAM, requiring two memory operations (read and write) per cell. Accordingly, the memory bandwidth is the primary limitation on total throughput. Throughput in the output-queued switches is also limited by the memory bandwidth because the output buffer must have a bandwidth equal to the aggregate throughput of the switch when cells are simultaneously received from every input port. Although the bandwidth demand is usually less, and techniques exist for reducing the demand on the switch fabric, the aggregate throughput still requires the buffer memory to be some limited multiple of the port link rate. Input-queued switches require the least memory bandwidth because each queue module is only required to buffer cells at the arrival rate of a single port, instead of a multiple or the aggregate arrival rate of the entire switch module. Thus, the input-queued switch architecture provides superior scalability, better suited to ultra-broadband ATM switches.
Bottlenecks still develop in conventional input-queued switches when queued cells are scheduled for transmission. Conventional input-queued switches utilize scheduling algorithms such as the round-robin matching switch algorithm which achieve approximately 58% utilization of throughput. Round robin switches operate in the following fashion. Unmatched inputs send requests to every output for which they have inputs. If an unmatched output receives a request, it chooses the request next appearing in a round-robin schedule beginning with the highest priority element. The output notifies each input whether or not a request was granted, and a pointer is incremented passed any granted input. A phenomena termed head-of-line (HOL) blocking occurs while the highest priority cell for each input blocks scheduling of remaining cells for each input.
One technique proposed to solve this defect is a neural network based cell scheduler, proposed in "The performance Analysis and Implementation of an Input Access Scheme in a High-Speed Packet Switch", IEEE Transactions on Communications, vol. 42, pp. 3189-3199, December 1994. While the neural network achieves additional throughput, its practical implementation is questionable because of the large number of neurons (square the number of input ports) and interconnections (cube the number of input ports) which are required. Further improvements are required to provide a practical and efficient inputrequired ATM switch which realizes high throughput and meets other practical requirements such as scalability, fast response time, and low circuit complexity having low transistor and interconnect counts.
Accordingly, it is an object of the present invention to provide an improved input-queued ATM switch having a high throughput potential and a practical, scalable hardware implementation.
Another object of the present invention is to provide an input-queued ATM switch that avoids head-of-line blocking and is capable of achieving nearly 100% bandwidth utilization.
An additional object of the present invention is to provide an input-queued ATM switch having separate queues for each of a plurality of input ports, each input queue maintaining separate virtual queues for a plurality of output ports, the switch having a cell scheduler which considers multiple cells corresponding to separate output ports received from each input queue in a single selection process.
A further object of the present invention is to provide an input-queued ATM switch including a matrix cell scheduling unit which provides a cell transmission schedule that fills a traffic matrix with queued entries and resolves conflicts to maximize transmission opportunities for remaining ports.
A still further object of the present invention is to provide an input-queued ATM switch including a matrix cell scheduling unit using a traffic matrix having a set of entries corresponding to assigned priority levels of highest priority level cells queued in an input port queue, and which successively chooses from the set of entries by assigning a weight to each entry that depends upon other remaining entries in a common row or column, choosing the heaviest entry in the traffic matrix, and reducing the traffic matrix upon each selection of the heaviest entry, to thereby resolve conflicts so that only one entry per row per column are selected.
Yet another object of the present invention is to provide an input-queued ATM switch including a multi-tag input queue buffer, that assigns priority levels to queued cells stored in randomly accessible cell rooms according to a predetermined function, maintains the priority levels as tags in order of their priority for each destination port, and which sends the highest priority tags for each destination port to the cell scheduler as entries corresponding to the highest priority queued set of cells, wherein the cell scheduler assigns weights to the entries depending on other entries originating from, or destined for, a common port, and which maximizes throughput by conducting an iterative search for the heaviest entry in a traffic matrix formed by the entries, and reducing the traffic matrix upon each selection of the heaviest entry, to thereby resolve conflicts so that only one entry per row per column are selected.