In order to prevent interference amongst various subscriber-contracted data packet traffic flows in a router targeting transmission to the same destination egress, and ultimately along a common transmission fiber optic link(s) or the like, a mechanism is needed to enable such a router to maintain fairness among all the contracted flows to the same egress queue, in terms of excess bandwidth sharing and in accommodatable data packet dropping or discarding. The router, moreover, is required to be capable of providing guaranteed contracted-for service in terms of bandwidth, latency, jitter and drop rate characteristics within Service Level Agreement (SLA).
Prior art routing has heretofore been subject to the limitations of inefficient usage of egress bandwidth and of unfair dropping of competing ingress port data traffic under conditions of congestion.
Specifically, in prevailing architectures, when a plurality of ingress ports sends variable-sized data packets to a given egress queue, they do so without knowledge of the egress queue data. The contribution of an individual port to a given egress queue is accordingly limited totally by the nature of the incoming data on that port. This gives rise to two problems, as follows.
First, since traffic patterns in networks are bursty and dynamically changing over a given time duration, a particular port (say Port A) may not actually send the pre-subscribed or contracted-for average data rate to a particular egress. Thus it does not fully utilize its contracted bandwidth over time. In such instances, another port (say Port B) may have a need to send more than its average contracted data rate. So, in reality, Port B should be able to utilize “unused” bandwidth from Port A to send the data. Since, however, there is no knowledge of unused available bandwidth at Port B, Port B usually ends up dropping the traffic that dynamically exceeds its average data rate, resulting in inefficient usage of egress bandwidth.
Secondly, in cases where a plurality of ingress ports have a need to send data to a given egress port, there are times where the egress port is oversubscribed and congested. Some of the intended traffic targeted to this egress port must accordingly be dropped. This drop, however, needs to be fair among all the competing ingress ports. If one of the ingress ports is contributing much more to congestion, then it should experience more drops than other ingress ports that are not contributing so much to congestion. Prevailing architectures, unfortunately, do not allow the drop decision to be functioning with regard to both the egress queue depth and the ingress traffic behavior. This inability, therefore, results in unfair drops using present-day conventional techniques. In particular, prior RED (Random Early Detect/Discard) mechanisms apply a drop probability function based on queue depths of a virtual output queue maintained on the ingress side, as described, for example, in an article entitled “Random Early Detection gateways for Congestion Avoidance” by S. Floyd and V. Jacobson appearing in IEEE/ACM Transactions on Networking, August 1993. They do not take into account the actual queue depth of the actual egress queue.
As will later be explained, the present invention, on the other hand, allows for the proper usage of the actual egress queue depth-based drops, meeting the true intent of RED mechanisms that is not achieved by queue depth-based drops based on a virtual egress or conventional output queue postulation on the ingress side as is done in currently input buffered switch/router systems.
Present-day Random Early Detection is thus an output queue based mechanism, wherein the data drop (or mark) probability is proportional to the aggregation of the input data rate only. It is unable to identify where a packet comes from at an output or egress queue when the drop (mark) decision is made. When congestion has been detected at an output queue, therefore, the RED function randomly drops the data packet irrespective of how much bandwidth a customer may have contract-subscribed. The higher data rate one sends, indeed, the more packets that are dropped. The prior RED systems may, indeed, drop packets from a subscriber who is not exceeding its contract. Based on TCP protocol, that customer may thus unfairly be forced to reduce its output rate even though that customer has actually paid for that rate.
Using the traditional RED mechanisms, accordingly, a service provider is not able to guarantee allocated bandwidth to a customer for TCP traffic. The packets from that customer may indeed be dropped because of misbehavior of other customers. This forces a customer to reduce its data transferring rate even if the customer never over-subscribed. A service provider has no way, thus, to guarantee the promised bandwidth with current RED mechanisms. Since the traditional RED mechanism drops packets based on data accumulation of the output queue only, unfairness results among input flows similar to the above-described example.
In accordance with the present invention, on the other hand, fairness is introduced wherein a packet cannot be dropped just based on the rate of traffic alone. The invention rather enforces the rule that no packet should be dropped by the data flow switch for “in-contract traffic” under the guaranteed bandwidth provided for that flow.
Over-subscribed flows, however, may violate their respective contracts in different degrees. One that over-subscribes bandwidth, however, should be penalized. In practice, some flows under-subscribe the bandwidth of their contracts, while others over-subscribe their allocated bandwidth. The extra bandwidth available from under-subscribed flows may be less than the bandwidth over-subscribed by others, so this excess bandwidth should be fairly distributed to those over-subscribed flows. In accordance with the invention, therefore, the percentage of excess bandwidth a flow may receive is proportional to the amount of bandwidth in its contract. In other words, the out-of-contract traffic should be dropped proportionally to its rate of over-subscribing. This may be expressed mathematically as follows:
                              R          drop                =                              R            -                          R              contract                                R                                    (        1        )                                          D          =                      T            ×                          R              drop                                      ,                            (        2        )            where R is the rate of traffic flow, Rcontract is the bandwidth the customer bought in its contract, Rdrop is the drop rate of the flow, T is the traffic flow, and D is the amount of data dropped. Since the nature of IP traffic is bursty, the fairness of sharing available excess bandwidth and of the data packet drops from all traffic flows is even more important and is addressed by the present invention.
Mechanisms for implementing such fairness include also input data packet forward processing systems for policing the provision of fair service for all input flows. This cuts the over-subscribed traffic irrespective of the bandwidth actually existing at the egress side. Bandwidth that is not used by other customer-flows at that moment, however, is lost. In addition, most Internet based TCP/IP traffic is, by nature, very bursty with the bursts of different traffic flow usually not happening at the same phase. The matter of how to share the bandwidth, however, is one of most critical aspects of the Internet switch. Input policing is not an ideal mechanism for managing such bursty traffic—it wastes bandwidth and resources and money.
Further under the present invention, accordingly, the input or ingress packet forward processing system is caused to allow over-subscribed traffic to be transferred if there is excess bandwidth available. The processing also provides a weight account for the extra bandwidth that a customer consumes, and a mechanism to enable the service provider appropriately to charge for it.
Under the invention, the packet-dropping decision is consolidated into one traffic management (TM) process in view of the fact that an end node actually has no idea as to the reason that its packet was dropped. A packet dropped for some management reason is, indeed, not distinguished from one dropped by the RED mechanism. An end node may reduce the amount of data supposedly to avoid congestion, whereas, in actual fact, no congestion may even exist. The real drop probability is then higher than necessary. Based upon the above, the present invention, as previously stated, consolidates the drop functions into one entity.
As earlier noted, in order to maximize the throughput, passing or dropping decisions must also be based on the actual output queue situation. Hence the output buffer switch is also necessary efficiently to share the resources in handling bursty traffic such as TCP/IP traffic, and, indeed, creates a new business revenue path for service providers.
By nature, this preferred electronic switch fabric (ESF) is indeed an output buffer switch, and such precisely provides the output queue status to allow making control decisions wisely.
The invention, indeed, preferably uses in its best mode, the type of output-buffered shared memory system described for the data-switch fabric system (ESF) switch router in U.S. patent application publication No. 2003/0043 828 A1, Mar. 6, 2003, Method Of Scalable Non-Blocking Shared Memory Output-Buffered Switching Of Variable Length Data and Packets From Pluralities Of Ports At Full Line Rate, And Apparatus Therefor (U.S. patent application Ser. No. 09/941,144, filed Aug. 28, 2001). This system, moreover, is preferably addressed by the technique of U.S. patent application publication No. 2003/0120594A1, Jun. 26, 2003, Method Of Addressing Sequential Data Packets From A Plurality Of Input Data Line Cards For Shared Memory Storage And The Like, And Novel Address Generator Therefor (U.S. patent application Ser. No. 10/026,166, filed Dec. 21, 2001, now U.S. Pat. No. 6,684,317. Other systems may also be suitable for some applications, but the use of these preferred shared-memory techniques, however, provides the advantage of scalable-port non-blocking shared-memory output-buffered variable length queued data switching and with sequential data packet addressing particularly adapted for such shared memory output-buffered switch fabrics and related memories.
Such output buffered switching alone, however, may not always be better than the earlier described input buffer switching for the purposes herein. An input buffer switch, indeed, has the advantage over an output buffer switch that it is capable of identifying different traffic flows, and therefore makes a measure of flow-based fairness dropping possible. It also allows the switch to drop data packets from only “out-of-contract traffic flow”, and provides a bandwidth-based billing mechanism for bursty traffic flow, as previously mentioned. In addition, it completely avoids global synchronization from traditional RED mechanisms.
The present invention, accordingly, in its previously described novel approach of consolidating the drop functions into a single entity, provides both input packet forward processing system capability and an output buffer switch capability and in combination to support a much more sophisticated and improved type of flow-based fairness dropping. The invention, moreover, unlike current input buffer switching systems where buffering is required in view of limitations in switching in the switch fabric, does not use buffer switching in the input or ingress.
In summary, therefore, the passing or dropping decision under the invention, is made based upon both destination queue status and input flow status, with the ESF providing the information on both statuses.
The invention, furthermore, again unlike the prior art, bases the drop probability function on two parameters: (1) the over-subscribing rate of current data flow, and (2) the actual egress queue depth. Depending upon the egress port situation, over-subscribed packets may or may not go through the switch. The drop rate mode is proportional to the ratio of data flow over-subscribing, with data flow being characterized by bandwidth and burst size.
To make implementation possible in practice, in accordance with a further novel feature of the invention, the over-subscribing rate is digitized with three conditions or “colors”; (1) in-contract; (2) out-of-contract but in burst size; and (3) out of both contract and burst size. There is one drop probability function per “color”, and the three of them together implement a three-dimensional function that provides the new results attained herein.
Under the invention, thus, data packets from an under-subscribed or under-used allocated flow are guaranteed not to be dropped when the destination egress queue is not over-booked. Packets from over-subscribed flows, moreover, are dropped only when either no excess bandwidth is allowed for that flow, or no excess bandwidth is available at the destination egress port. Excess bandwidth from under-subscribed or under-used flows are desirably distributed amongst the over-subscribed flows proportionally to the contracted bandwidth of each flow, or distributed based upon other factors, such as being controlled by the setting of a pre-specified set of drop functions or the like. Thus, by default, the drop rate of each flow is proportional to the percentage of over-subscribed bandwidths, or is controlled by setting the drop function in a way so as to be intentionally biased toward certain flows.
The methodology of the invention for attaining an improved fair distribution to the over-subscribed flows, moreover, is easily configured, where desired, to simulate traditional RED mechanisms, though with the distinct improvements earlier discussed.
The technique of the invention is of quite broad application, furthermore, being particularly useful in the transmission of variable length data packets and with configurable adaptive output scheduling for enabling simultaneous transmission on a common transmission link, as of fiber optics, of differentiated services for various different traffic types. These may range from high priority real-time voice, to financial transactions or the like, and in a converged network environment as described in co-pending U.S. patent application Ser. No. 10/702,152, “Method Of And Apparatus For Variable Length Data Packet Transmission With Configurable Adaptive Output Scheduling Enabling Transmission On The Same Transmission Link(s) Of Differentiated Services For Various Traffic Types, filed Nov. 5, 2003. Such a system, indeed, provides for the execution of the various different QOS (quality of service) algorithms used with such various different traffic types while co-existing in a converged network environment and also while simultaneously preserving the respective different service characteristics for real-time or high-priority traffic and providing for supplemental bandwidth allocation, all the while addressing maximal link utilization. This result is attained, moreover, through fine and balanced control of which type of traffic is transmitted on the link for a given duration, and how much of that traffic is transmitted on the link.
The present invention brings to such converged network environments, moreover, a further universal refinement of the before-mentioned guaranteeing of contracted-for-bandwidth to respective customers, and of the novel RED mechanism for providing vastly improved fairness in the sharing of unused bandwidth with the over-subscribing customer data flows and in the data dropping—all consolidated into a single entity, as earlier mentioned, that also enables the new business opportunity for billing for over-subscribed usages of excess available bandwidth.