Link aggregation is a technique for creating a single logical link from multiple physical links to carry a greater amount of data than a single link can carry. Link aggregation can also provide some failure protection because the traffic is distributed among different physical links in the group. If a link aggregation group (LAG) is composed of three links, for example, each with a capacity of 1 Gb/s, it is possible to transmit 3 Gb/s of data. Alternatively, the same LAG can be used to transport 2 Gb/s of data (for example), which would allow any of the individual links to fail without impairing throughput. In a distributed system, link aggregation can also protect against equipment failure in the case where each link is terminated on different network equipment. Because of these capabilities, and because of the relative simplicity of LAG, it is broadly used for connections between customer and provider networks.
It is desirable to perform loss measurements on logical links using link aggregation. ITU-T Recommendation Y.1731 Requirements for OAM in Ethernet Networks (hereinafter “Y.1731”) describes a protocol for determining end-to-end traffic loss in a network. In general, packet loss is determined by exchanging information between different network entities about how many packets have been transmitted and received in a specified time period or periods. There is no definition of how Y.1731 should work when using link aggregation. While Y.1731 describes a method of determining the loss of packets on individual links, there is no explicit solution for combining this information to determine the total loss measurement for all links in an aggregation group.
In some system architectures, determining the aggregate loss for a LAG can be solved by placing the packet counters at a point in the system where the data across all links in the aggregation group has been combined. Such systems are usually low cost systems where loss measurement and monitoring of a service may be less important. However, in a distributed packet system, there is often no single point to combine and count all packets associated with a particular LAG. In this case, the only solution is to measure the loss on each individual link and combine this data to provide the total loss for the group. The problem with this approach is that unless the measurements are all taken at the same moment in time, the total loss will not be representative of the period in which it was measured with respect to different links in the LAG. Since the period of measurements is typically between 1 s and 100 ms, accurate alignment of the packet counters is challenging.
In distributed network architectures, the individual links in a LAG may be terminated in different network entities. Distributed network architectures, thus, present a problem similar to distributed systems because there is no point at which the combined packets can be counted. In distributed network architectures, this problem is even more intractable.