The world of computer and communications networking is continually evolving. More efficient and more effective devices and methods are being developed to overcome the bottlenecks in the network datapath.
One of the bottlenecks is the problem of oversubscription of resources in a network switch. Currently, network switches are implemented using line cards with multiple ingress (input) and multiple egress (output) lines. Merging data flows from ingress lines to the egress lines requires complex and sophisticated solutions to provide adequate service to the different data flows passing through the ingress lines. A challenge to the egress data flow merging problem is that some flows passing through the ingress/egress line card have minimum transmission requirements. As such, this traffic must be guaranteed a minimum amount of resources (e.g. transmission capacity and number of cells or DTUs required to transport the traffic).
There are currently a few solutions to this question of sharing line capacity between multiple ingress lines. The first solution is that of using a switch fabric with an overspeed factor of N (ideal output queuing switch), in which N is the number of input line cards. These switches have no input buffer and therefore the switch fabric is not a bottleneck for these switches. However, a speedup of N is not feasible for high capacity switches.
A second solution is that of using simple high-speed switches with a small speed up and using virtual output queues in the ingress line cards. These high-speed switch fabrics are intentionally simple and leave most of the work to the Traffic Management chips. Typically, the switch fabrics uses certain forms of arbiters to resolve the conflict between the simultaneous requests for a destination port from multiple source ports. Due to the high speed of such switches, typical arbiter implementations provide relatively simple scheduling algorithms such as a hierarchy of strict priority among the classes and round robin among the ingress ports without awareness of the QOS provisioning of each line card. Consequently, the bandwidth distribution among the source ports are dictated by the characteristics, rather than by the service requirements of each individual line cards. When the switch experiences traffic oversubscription, the scheduling discipline of the switch arbiter will make the local traffic scheduling on the line card ineffective because the switch itself is the congestion point in the system.
A third solution involves using a central scheduler. Some switch fabrics use a central scheduler that holds all the rate information of the egress line cards, and therefore could precisely distribute the egress bandwidth fairly between ingress line cards. Due to their complexity these switches are not scalable and therefore cannot used in high speed and high port switches/routers. The central scheduler needs to maintain a global state information database for all traffic flows in the system. In a typical switch system with N ports, such state information is in the order of N×N. Because of the N2 context overhead such a solution is not scalable.
A fourth solution involves managing the grant/request system between the ingress and the egress. Essentially, the ingress requests resources from the egress to allow the incoming data through the ingress line to exit through the egress line. When required, the egress then grants these requests and allows data to pass from the ingress to the egress. Some virtual output queuing switches implement per class request grant protocols. The Request messages are generated separately for each input queue and the egress port has a distributed scheduler that is responsible for scheduling the requests for that particular port. Grant messages are generated and sent back by the fabric to the ingress line card, which then transmits a packet according to the input queue identifier in the Grant message. This mechanism requires the switch fabric to have sufficient overspeed, dedicated channel or efficient support for variable size Request/Grant/Data messages. It also requires the egress port to implement a per input class scheduler. The overall cost and complexity of such switch fabric is high. There is no known mechanism for scaling such switch fabric to Tera-bit speed.
Unfortunately, none of the above solutions provide the flexibility required with a minimum of hardware/software. An ideal solution should provide to each port/class output pair in a line card its assigned committed rate. Also, the solution should also be able to share extra transmission capacity (or bandwidth) between all line cards trough some weighting/sharing factor. The solution should require minimal hardware and must only consume a small fraction of a switch fabric's resources. Any virtual output queuing switches must be supported and the solution should be designed to work on slow-changing traffic.
It should be noted that the term data transmission unit (DTU) will be used in a generic sense throughout this document to mean units through which digital data is transmitted from one point in a network to another. Thus, such units may take the form of packets, cells, frames, or any other unit as long as digital data is encapsulated within the unit. Thus, the term DTU is applicable to any and all packets and frames that implement specific protocols, standards or transmission schemes. It should also be noted that the term digital data will be used throughout this document to encompass all manner of voice, multimedia content, video, binary data or any other form of data or information that has been digitized and that is transmitted from one point in a network to another as a payload of a data transmission unit.
For this document, the term “rate” is defined to mean amount of data transmitted per unit time. Thus, any references to “transmission rate” is defined as how much data is transferred or transmitted for a given amount of time. “Rate” is not to be taken to mean the speed or velocity at which data travels through a transmission medium.