Analysing networks in order to obtain measurements indicative of network performance can be done using various techniques, including techniques involving active testing (in which traffic is sent over a network specifically for the purpose of conducting tests) and techniques involving passive testing (in which traffic already flowing across a network due to user activity is analysed).
Techniques involving passive testing can show performance of real applications as used by real users, but are generally limited to testing applications and networks being used at a particular time, and can make it hard to compare network performance since the traffic over which tests are being applied varies. Active testing using reference traffic sent across the network does not generally have this disadvantage.
Techniques involving active testing also have problems in what can be tested, however. Typically, active testing techniques either test services themselves (e.g. web page or video performance), or the underlying network. Testing is generally performed from a test-point to a service or to a test-server located in the network. By using multiple test-servers, network operators can get a view of performance across different paths or sub-paths of the network, but it is expensive to deploy and maintain test-servers on a large scale, and this may not give views of networks not under the operator's control unless test-servers are sited within them. There is therefore an interest in using basic network routing equipment to conduct tests, using basic tools such as “traceroute” and “ping”.
“Traceroute” is a technique which exploits the feature of Internet Protocol (IP) networks to generate a reply message to the sender of a message when a Time-To-Live (TTL) or hop-limit count expires.
“Ping” is a technique which can be used to test the reachability of nodes in a network, and to measure the round-trip time (RTT) for messages sent from an originating node (such as a computer, server, router, etc.) to a destination node and back. Messages in accordance with the Internet Control Message Protocol (ICMP), referred to as “ICMP probes”, “probe messages”, or simply “probes”, may be sent from a sender acting as a testing-point, generally via one or more intermediate nodes, to a remote network node which, if it is the intended destination of or “target” for the probe (generally indicated in header information included in the probe), sends an associated probe response message back to the sender, allowing the sender to confirm that the target has been reached and allowing the sender to measure the RTT (also known as latency).
In the present context and below, it will be noted that the word “probe” is generally used in the sense of an “investigation” or one or more “investigative messages”, rather than a “sensor”. The probes concerned may therefore be one or more packets, or one or more of another type of message sent via a network.
Techniques such as the above are commonly used to determine the nodes located along a network path and also to analyse latency or latency variation between pairs of nodes. Overall latency can determine how far away a node is, while the variation in latency, which may be caused by the filling of network queues, can be used as an indication of network congestion. Such techniques can provide a very fine-grained view of network performance at each node of every network path, allowing performance to be viewed by a network operator even in respect of nodes and paths across networks not under the operator's ownership or control.
A problem with such techniques is that results are not always reliable indicators of network performance. While actual network traffic passing through a node is generally handled in an optimised forwarding element of the node (“fast-path” processing), a “traceroute” response or “ping” will generally be handled by the node's general Central Processing Unit (CPU), and generally involves the generation of a new packet or other such message (“slow-path” processing). Traceroute and ping measurements thus often indicate delays and losses that are not actually experienced by forwarded user traffic.
As a result, previous attempts to determine network performance using basic router functions such as traceroute and ping have often been flawed due to the possibly slow or variable handling of these probes (i.e. probe packets or other messages) by standard network equipment such as routers and other nodes, leading to mis-diagnosis of network problems. Many systems have therefore used specialised testing infrastructure (e.g. dedicated test-servers), but as indicated above, these can generally only give overall end-to-end path performance between the test point and wherever these test-servers are located.
There is thus a need for improved ways of testing network performance which are applicable even when using basic probe techniques such as “traceroute” and “ping” in IP networks.
The “Center for Applied Internet Data Analysis” (“CAIDA”) has developed a tool called “Scamper” for use in a project referred to as the “Archipelago” project. This is intended to allows bulk traceroute and ping measurements. They have published the following papers: “Challenges in Inferring Internet Interdomain Congestion” by M. Luckie, A. Dhamdhere, D. Clark, B. Huffaker, & K. Claffy, Internet Measurement Conference (IMC), November 2014, pages 15-22, which is available online at: https://www.caida.org/publications/papers/2014/challenges_inferring_interdomain_congestion/ and “Measurement and Analysis of Internet Interconnection and Congestion” by D. Clark, S. Bauer, K. Claffy, A. Dhamdhere, B. Huffaker, W. Lehr, & M. Luckie, Telecommunications Policy Research Conference (TPRC), September 2014, which is available online at: https://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/
These papers consider how data can be used to infer congestion, particularly between network domains, and discuss how to analyse the data to detect network problems.
Referring to other prior art citations, US2005122983 (“Gilmartin”) relates to calculating a VLAN latency measure, and in particular to calculating a multi-point VLAN latency measure without needing to know all of the details of the connection topology of the VLAN.
US2014269303 (“Comcast/Toy”) relates to managing congestion in a network. One method involves receiving delay information (representing link level delay, connection level delay, or class of service level delay) relating to one or more network points, comparing delay information to a threshold, and if the delay information exceeds the threshold, executing a congestion control process associated with the one or more network points.
EP1206085 (“Infonet”) relates to methods and apparatus for automated service level agreements.
An IETF Network Working Group Internet Draft entitled “A Round-trip Delay Metric for IPPM” dated November 1998 and authored by G. Almes, S. Kalidindi and M. Zekauskas defines a metric for round-trip delay of packets across Internet paths.
A “Tech Notes” publication from Cisco Systems entitled “Understanding the Ping and Traceroute Commands” (http://www.cisco.com/image/gif/paws/12778/ping_traceroute.pdf) dated January 2010 illustrates the use of the ping and traceroute commands and, with the aid of some debug commands, captures a more detailed view of how these commands work.
US2010/315958 (“Luo et al”) relates to methods and apparatus for measuring network path quality in a non-cooperative manner, and involves sending a probe consisting of probe data packets to a remote node and receiving a response consisting of at least one response data packet therefrom.