The present embodiments relate to computer networks and are more particularly directed to a network system in which the upstream switch transmission rate is controlled in response to traffic buffering in the adjacent downstream switch.
Ethernet networks are one type of network that has found favor in various applications in the networking industry and for various reasons. For example, Ethernet is a widely used and cost effective medium, with numerous interfaces, and capable of communications and various speeds up to the Gbps range. Ethernet networks may be used to form a Metro Ethernet network, which is generally a publicly accessible network that provides a Metro domain, typically under the control of a single administrator, such as an Internet Service Provider (“ISP”). Metro Ethernet may be used to connect to the global Internet and to connect between geographically separated sites, such as between different locations of a business entity.
An Ethernet network includes various nodes for transporting traffic among the network, where such nodes include what are referred to in the art as switches or routers, where for sake of consistency in this document are hereafter only referred to as switches or switch nodes, while one skilled in the art will appreciate that such switching functionality may be employed in other network devices. The switches implement known techniques for servicing traffic that arrives from different nodes and for minimizing transient (i.e., short term) congestion at any of the nodes. The IEEE 802.3x is the IEEE standard on congestion control. Under IEEE 802.3x, and in the event of congestion in a buffer corresponding to a switch input port, the switch provides “backpressure” by sending a pause message to any directly upstream Ethernet switch that has an output port that is transmitting to the input port that has developed the congestion. Such congestion is detected by a switch in response to its buffering system reaching a threshold, where once that threshold is reached and without intervention, the switch becomes unable to properly communicate its buffered packets onward to the link extending outward from that switch. In response to receiving the pause message, the upstream adjacent switch is thereby commanded to cease the transmission of data to the congested switch for a period of time specified in the pause message, thereby permitting the congested switch additional time to relieve its congested state by servicing the then-stored data in its buffering system. However, because IEEE 802.3x is a non-selective back-pressure congestion control, all the traffic aggregates passing through the congested link get paused irrespective of their ongoing traffic rates. This results in unfairness, as non-aggressive sessions also get penalized along with aggressive sessions.
Another approach also has been suggested for responding to congestion in Metro Ethernet networks. In “Selective Backpressure in Switched Ethernet LANs”, by W. Noureddine and F. Tobagi, published by Globecom 99, pp. 1256-1263, and hereby incorporated herein by reference, packets directed to a same Metro Ethernet network destination MAC address are stored in a specific output buffer within a node. When the packet occupancy within such a buffer reaches a threshold limit, backpressure is applied to all the adjacent upstream nodes that have a buffer containing packets of that corresponding MAC destination. However, such an approach has drawbacks. For example, the approach is non-scalable, as there should be n number of buffers (or buffer space) in a node that switches traffic to n different MAC destinations. The number of buffers required also increases when traffic-class is introduced. Also if one of the buffers is not optimally utilized, other traffic with a different MAC destination is not able to utilize the unused resources in the sub-optimal buffer(s), thereby leading to wastage. Further, each session capacity requirement and path can vary with time as well as network condition and, hence, there is no provision for local Max-Min fairness.
Two additional documents also suggest response to congestion in Metro Ethernet networks. Specifically, in “A Simple Technique That Prevents Packet Loss and Deadlocks in Gigabit Ethernet”, by M. Karol, D. Lee, S. J. Golestani, published by ISCOM 99, pp. 26-30, and in “Prevention of Deadlocks and Livelocks in Lossless, Backpressure Packet Networks”, by M. Karol, S. J. Golestani, D. Lee, and published by INFOCOM 2000, pp. 1333-1342, and hereby incorporated herein by reference, a buffer is described that is shared by more than one session, where a session is defined as a packet or packets communicated between a same ingress and egress Metro Ethernet network edge node (i.e., as identifiable by the addresses in the MAC-in-MAC addressing scheme used for Metro Ethernet networks). The buffer is divided into segments and each segment is given an identification number. Each segment is allowed to store packets with different MAC addresses at the same time, but an arriving packet can only be stored in a segment that currently has packets with the same MAC addresses. If a segment fills to its limit, the node disallows any arriving packets from being stored not only in the congested segment but also other segments whose identification number is smaller than the congested one. At the same time, a backpressure message is sent to every adjacent upstream node. The upstream-nodes will then temporarily stop serving all buffer segments that have identification number similar or smaller than the downstream congested-node segment. Thus, the upstream node is prevented not only from transmitting to the segment that was filled, but also to other segments as well (i.e., those with a smaller identification code). These segments also will be temporarily prevented from accepting any arriving packets. These approaches do not therefore provide for fairness.
In addition to the limitations of the approaches noted above, it has been observed in connection with the preferred embodiments that congestion response at the Ethernet layer may be further improved with consideration of traffic qualifiers at a different network layer. In particular, under the International Standard Organization's Open System Interconnect (“ISO/OSI”) model, there are seven network layers. Layer 2 of the ISO describes the data link which includes the logical connections to a network packet destination, using a network interface which includes the Ethernet interface and the concept of Ethernet addresses. The Ethernet address is a 48-bit address, often referred to as a Media Access Control (“MAC”) address. Thus, each Ethernet device has a unique MAC address and the packet header at this layer includes both a source and destination MAC address so as to properly traverse the layer. Layer 3 of the ISO is the network layer and includes the Internet (or Internetwork) Protocol (“IP”), which permits routing of packets from one network to another. In this layer 3, “Differentiated Services” (“DiffServ”) are expected to be widely deployed. In particular, “Assured Services” of the DiffServ architecture facilitate a priority-based drop precedence in the event of congestion. Packets of a session/flow are colored, in descending order of importance or drop precedence, as green, yellow, and red, depending on the rate and thereby leading to a highly scalable congestion management in ISO layer 3. As a consequence, the present inventors have observed that Ethernet, being the ISO layer 2 transportation technology, would be highly improved as supporting such differentiation that is not only scalable but is also consistent with the Assured Forwarding model of the DiffServ.
In view of the above, there arises a need to address the drawbacks of the prior art as well as the considerations of other layer technologies, as is accomplished by the preferred embodiments described below.