The present invention relates to Connectivity and Fault Management (CFM) and, more particularly but not exclusively, to implementing CFM techniques over networks that put into practice link-aggregation mechanisms.
As the number of Ethernet services grows, service providers require a robust set of management tools to maintain Ethernet service networks. For example, a set of Ethernet operation, administration and maintenance (OAM) functions, which is also known as a set of Ethernet Connectivity and Fault Management (CFM) functions, defines certain capabilities that are needed in order to assure the integrity and reliability of the network. Moreover, in order to adapt the Ethernet technology in a carrier-grade service environment, various Standards are being developed to provide advanced OAM/CFM capabilities across the entire network. For example, the IEEE 802.1ag Standard and ITU-T Recommendation Y.1731 incorporated by reference herein, define different CFM capabilities. By way of standardization, the CFM domain space is defined in terms of what are referred to as flow points. In the context of the IEEE 802.1ag specification suite, the flow points may be associated with maintenance entities (ME) as defined in related Standards documentation. A port can implement multiple MEs of different types. A flow point at the edge of a CFM domain is a maintenance end-point (MEP). A flow point inside a domain and visible to an MEP is a maintenance entity group (MEG) intermediate point (MIP). Whereas MEPs are active MEs, which may be used by system operators to initiate and monitor CFM activity, MIPs passively receive and respond to CFM flows initiated by MEPs. Each one of the MIPs and MEPs has a unique identity that uniquely identifies it in the Layer 2 network. Usually it is the MAC address of the interface which the MEP or MIP is associated to.
In parallel to the advancement in the progression of management tools for Ethernet service network maintenance, bandwidth has become a critical component in embedded network devices. The importance of bandwidth has increased as new applications demand higher transfer speeds. One costly solution for this problem is a complete upgrade of the underlying physical layer or data-link layer technology. Such technologies usually provide an order-of-magnitude increase in available bandwidth. However, such an upgrade involves large cost outlays due to network structure changes, and higher infrastructure and deployment costs. Thus, an engineering solution that requires minimal alterations in existing network infrastructure is needed. Such a solution, which has been adapted by many network planners, is a link aggregation group (LAG) mechanism. A LAG is a group of two or more network links bundled together to appear as a single link. For instance, bundling two 100 Mbps network interfaces into a single link creates one 200 Mbps link. A LAG may include two or more network cards and two or more cables, but the software identifies the link as one logical link. This solution allows the network planner to use two or more existing physical network links, such as cables or ports, to transfer packets of the same data from one entity to the other, without changing the structural properties of the network. In other words, the two or more network links are used in parallel in order to increase the logical link speed beyond the physical limits of any one of them. The LAG connection allows two or more links to be aggregated together in such a manner that a destination entity can treat data that is received from a number of links as a single logical link. For example, a LAG, which comprises N links, would consist of N parallel instances of point-to-point links, each of which is completely unaffected by being part of a group. Examples for modules that implement the LAG mechanism are Cisco's Catalyst 6500 series and Juniper's T-series platform.
When a LAG entity receives a frame to forward, it determines to which of several output ports to send the frame. The forwarding entity usually attempts to distribute the load evenly over each physical output port of the aggregated logical link. Usually, the frames distribution is based on a predefined hashing function.
It should be noted that LAG is also known as an Ethernet trunk, a network interface card (NIC) teaming, a port teaming, a port trunking, and a NIC bonding. LAG is based on the IEEE 802.3ad networking Standard, which is herein incorporated in its entirety by reference.
A new exemplary method for linking aggregation is disclosed in U.S. Pat. No. 7,023,797, issued on Apr. 4, 2006. The patent discloses methods and apparatuses to calculate an output mask in a network-switching engine that can support multiple aggregation methods. An aggregation table is programmed to include physical link selection information for two or more aggregation methods that are associated with two or more different LAGs.
When a certain network such as a local area network (LAN) or a virtual-LAN (V-LAN) employs LAG interfaces, some of the connectivity fault management functions as currently specified by the IEEE 802.1ag Standard and ITU-T Recommendation Y.1731 cannot be utilized and therefore cannot detect certain malfunctions. This inability is caused due to the fact that, when LAG interfaces are used, packets, which are forwarded from one entity to another are, not sent via a known single fixed network link but via a set of aggregated output links that comprise a single logical port or link. The packets are distributed among the links by a balancing algorithm which is implemented locally. Therefore, the path of each packet cannot be predicted by the originating ME that initiates the CFM function. That could affect the reception of reply messages (e.g. loopback or linktrace replies) and performance results such as frame delay variation. Moreover, as noted by the Standards, whenever one of the aggregated output links fails, the other aggregated output links in the group are configured to take up the traffic load that was being handled by the failed link in order to avoid disruptions in the communication among interconnected devices. Since the traffic load that was being handled by the failed link has been taken up by other aggregated output links in the group, the fault management functions of the CFM techniques cannot identify the failure. In networks that employ only single link connections, the failure identification is based on CFM techniques that detect link failures by Multicasting or Unicasting messages from a certain MEP to another ME. When LAG connections are used, messages are not transferred via one physical link, as described above and, therefore, the failed link is not detected.
There is thus a widely recognized need for, and it would be highly advantageous to have, an apparatus, a method, and a system for implementing fault management functions in networks with LAG connections which are devoid of the above limitations.