IP networks today support a range of business-critical applications, and network performance problems can have serious adverse business consequences and revenue losses such as Service Level Agreement (SLA) violations for the service provider, and outages and business service disruptions for the customer. The ability to proactively monitor a network's health is therefore vital to critical network management functions such as problem detection, troubleshooting, and SLA compliance monitoring.
Network traffic management includes the ability to accurately and scalably measure the one-way packet loss experienced by traffic along a specific path between routers in a network. Existing measurement methods include both passive and active techniques. In currently deployed passive methods, specialized and highly expensive high-speed traffic monitors are deployed at network elements along a path of interest. The network elements compile reports on the packets, either individually or in aggregate. These reports are stored either at the network element for subsequent retrieval by the network management system via Simple Network Management Protocol (SNMP), or are communicated to a collector as exemplified by NetFlow, the latter which is used to routinely perform baseline loss measurements across network paths with no modification to its deployment. NetFlow is a network protocol developed by Cisco Systems to run on Cisco IOS-enabled equipment for collecting IP traffic information.
Currently, there are four basic approaches that are utilized for measuring one-way packet loss in a packet network. In the first, the SNMP that is used to access and/or exchange management information between network devices is employed with Interface Counters to ubiquitously report aggregate packet drop count from router queues. This expedient has several drawbacks including losses not specific to an interface that are not reported, limited temporal granularity due to the SNMP polling frequency (commonly several minutes), and unsynchronized polling intervals across routers, thereby making it difficult to compose link losses along a path.
In active measurement, probe packets are introduced into the network by a special purpose measuring device and these are dispatched to one or more destination network elements. Active performance measurements between host pairs can be used to directly measure packet loss rate, such as described in “Standardized active measurements on a tier 1 IP backbone,” IEEE Communications Magazine, May 2003 by L. Ciavattone, A. Morton, and G. Ramachandran. Coverage is limited to paths joining the deployed measurement hosts. In addition, the use of special purpose measuring devices can incur significant equipment, management and administrative costs. Other active measurement techniques include single host-based approaches such as ping to report round trip loss, and packet train-based methods such as pathchar (see A. B. Downey, “Using pathchar to estimate Internet link characteristics,” SIGCOMM, 1999). The latter requires high measurement bandwidth and loses resolution on higher speed links.
Another known approach is referred to as Network Performance Tomography, which shares many of the general properties of active measurement, but infers performance on component links by correlating measurements on intersecting paths through the network. See, A. Adams, T. Bu, R. Ćaceres, N. Duffield, T. Friedman, J. Horowitz, F. L. Presti, S. Moon, V. Paxson, and D. Towsley, “The use of end-to-end multicast measurements for characterizing internal network behavior,” IEEE Communications Magazine, May 2000. Correlated measurement generally requires finer resolution and more complexity in the measurement infrastructure, e.g. the ability for measurement endpoints to report observations on small groups of packets or even. individual packets.
Passive Measurement employs observations of a traffic flow at two measurement points to infer performance of the intervening path. For example, trajectory sampling as outlined in N. Duffield and M. Grossglauser, “Trajectory sampling for direct traffic observation,” IEEE/ACM Transactions on Networking, vol. 9, no. 3, pp. 280-292, June 2001, correlates sampling of traffic at different locations, with routers sampling packets only if a hash calculated over packet field that does not change in transit falls within a given set. See also T. Zseby, “Deployment of sampling methods for SLA validation with non-intrusive measurements,” Proceedings of Passive and Active Measurement Workshop (PAM), 2002. Hash-based selection is being standardized, but is not currently available as a standard router feature.
It would therefore be advantageous to provide improved network measurement techniques which enable accurate and scalable measuring of the one-way packet loss experienced by traffic along a specific path between routers in a provider network, without the need to deploy specialized equipment in the network. The existing router features and measurement infrastructure can be exploited to provide a loss estimation technique using routinely collected sampled flow level statistics. To the inventors' knowledge, no such system or method currently exists.