The present embodiments relate to computer networks and are more particularly directed to a Metro Ethernet network system in which its nodes transmit upstream pause messaging to cause backpressure for only selected upstream switches.
Metro Ethernet networks are one type of network that has found favor in various applications in the networking industry, and for various reasons. For example, Ethernet is a widely used and cost effective medium, with numerous interfaces and capable of communications and various speeds up to the Gbps range. A Metro Ethernet network is generally a publicly accessible network that provides a Metro domain, typically under the control of a single administrator, such as an Internet Service Provider (“ISP”). Metro Ethernet may be used to connect to the global Internet and to connect between geographically separated sites, such as between different locations of a business entity. Also, the Metro Ethernet network is often shared among different customer virtual local area networks (“VLAN”), where these networks are so named because a first VLAN is unaware of the shared use of the Metro Ethernet network by one or more additional VLANs. In this manner, long-standing technologies and infrastructures may be used to facilitate efficient data transfer.
A Metro Ethernet network includes various nodes for sake of routing traffic among the network, where such nodes include what are referred to in the art as switches or routers and are further distinguished as edge nodes or core nodes based on their location in the network. Edge nodes are so named as they provide a link to one or more nodes outside of the Metro Ethernet network and, hence, logically they are located at the edge of the network. Conversely, core nodes are inside the edges defined by the logically perimeter-located edge nodes. In any event, both types of nodes employ known techniques for servicing traffic arriving from different nodes and for minimizing transient (i.e., short term) congestion at any of the nodes. Under IEEE 802.3x, which is the IEEE standard on congestion control, and in the event of such congestion, a node provides “backpressure” by sending pause messages to all upstream Metro Ethernet nodes, that is, those that are transmitting data to the congestion-detecting node. Such congestion is detected by a node in response to its buffering system reaching a threshold, where once that threshold is reached and without intervention, the node will become unable to properly communicate its buffered packets onward to the link extending outward from that node. In response to such detection, the node transmits a pause message to every upstream adjacent node whereby all such adjacent nodes are commanded to cease the transmission of data to the congested node, thereby permitting the congested node additional time to relieve its congested state by servicing the then-stored data in its buffering system.
Another approach also has been suggested for responding to congestion in Metro Ethernet networks. In “Selective Backpressure in Switched Ethernet LANs”, by W. Noureddine and F. Tobagi, published by Globecom 99, pp. 1256-1263, and hereby incorporated herein by reference, packets directed to a same Metro Ethernet network destination MAC address are stored in a specific output buffer within a node. When the packet occupancy within such a buffer reaches a threshold limit, backpressure is applied to all the adjacent upstream nodes that have a buffer containing packets of that corresponding MAC destination. However, such an approach has drawbacks. For example, the approach is non-scalable, as there should be n number of buffers (or buffer space) in a node that switches traffic to n different MAC destinations. The number of buffers required also increases when traffic-class is introduced. Also if one of the buffers is not optimally utilized, other traffic with a different MAC destination is not able to utilize the unused resources in the sub-optimal buffer(s), thereby leading to wastage. Further, each session capacity requirement and path can vary with time as well as network condition and, hence, there is no provision for local Max-Min fairness. Particularly, in this existing approach, there is no scheme for differentiation among sessions and the traffic of each of the sessions may vary with time. Some sessions may be idle and some may become active for a period of time and so on. Thus, there is a need for an “arbitrator” to fairly allocate bandwidth for the status of the sessions. Max-Min fairness is an outcome of one such arbitrator for bandwidth. Under Max-Min fairness, the session that requires the least bandwidth is first satisfied/allocated by the arbitrator and the procedure is repeated recursively for the remaining sessions until the available capacity is shared.
Two additional documents also suggest response to congestion in Metro Ethernet networks. Specifically, in “A Simple Technique That Prevents Packet Loss and Deadlocks in Gigabit Ethernet”, by M. Karol, D. Lee, S. J. Golestani, published by ISCOM 99, pp. 26-30, and in “Prevention of Deadlocks and Livelocks in Lossless, Backpressure Packet Networks”, by M. Karol, S. J. Golestani, D. Lee, and published by INFOCOM 2000, pp. 1333-1342, and hereby incorporated herein by reference, a buffer is described that is shared by more than one session, where a session is defined as a packet or packets communicated between a same ingress and egress Metro Ethernet network edge node (i.e., as identifiable by the addresses in the MAC-in-MAC addressing scheme used for Metro Ethernet networks). The buffer is divided into segments and each segment is given an identification number. Each segment is allowed to store packets with different MAC addresses at the same time, but an arriving packet can only be stored in a segment that currently has packets with the same MAC addresses. If a segment fills to its limit, the node disallows any arriving packets from being stored not only in the congested segment but also other segments whose identification number is smaller than the congested one. At the same time, a backpressure message is sent to every adjacent upstream node. The upstream-nodes will then temporarily stop serving all buffer segments that have identification number similar or smaller than the downstream congested-node segment. Thus, the upstream node is prevented not only from transmitting to the segment that was filled, but also to other segments as well (i.e., those with a smaller identification code). These segments also will be temporarily prevented from accepting any arriving packets. These approaches do not determine the source that causes the congestion. Hence, there is a possibility that backpressure is applied to sources that are not causing the congestion, which is unfair in that those sources are penalized (i.e., via the cessation imposed by the backpressure) even though they are not the cause of the congestion. Further, the size of each segment is also rigid, that is, the number of packets that can be stored within a segment is fixed. Still further, the congestion mechanism is inefficient in that it is always triggered by the state of any one segment, even if the total packet occupancy in the buffer space, including potentially numerous other segments, has not reached a congestion state. Lastly, this approach has no provision for multi class traffic.
In view of the above, there arises a need to address the drawbacks of the prior art, as is accomplished by the preferred embodiments described below.