Communication networks are in wide use in many technological fields including distributed computing, data exchange and telecommunication applications. Communication networks generally include many nodes, such as bridges, LAN switches, routers, cross-connections and telephone switches. The networks further include communication links, such as cables, point-to-point radio connections and optical fibers, which connect the nodes. The networks also include ports, generally within some of the nodes, for attaching external devices such as computers, terminals, handsets, and multiplexers. These external devices are referred to as end-points, or hosts.
A major issue in both newly-deployed and existing communication networks is testing and trouble-shooting, i.e., checking whether the network is operating according to its specifications and, if not, determining the cause of the network's inadequate performance (for example, the identity of a faulty unit or link). Dedicated point-to-point testing equipment is a commonly-used network testing tool. Such equipment is described, for example, in U.S. Pat. No. 5,477,531, whose disclosure is incorporated herein by reference. Usually, dedicated point-to-point testing equipment requires two users to coordinate their operations in order to identify a misbehaving component of the network. To test a large network, the testing equipment must be moved between many ports of the network.
End-to-end tests of network response times and delays provide useful information regarding the operational status of the network. Such tests are helpful in determining that a fault or network overload has occurred. For example, in end-to-end timing testing, packets of a given size are sent from a source node to a destination node, which measures and reports the packet arrival times. In response time testing, the destination node sends a correlated echo packet back to the source, which measures and reports the round-trip time elapsed between sending the original packet and receiving the echo packet. When there is excessive delay or jitter in delivery of the packets, it is an indication that a problem exists. End-to-end tests by themselves, however, provide no further information as to the source and location of the problem within the network.
RMON (Remote Network Monitoring) is a family of standards defining information that a network administrator can use to monitor, analyze, and troubleshoot a distributed network from a central site. These standards, which are an extension of the Simple Network Management Protocol (SNMP), specify the information that a network monitoring system is expected to provide. RMON first became a standard in 1992 in Request for Comments (RFC) 1271 of the Internet Engineering Task Force (IETF). It is currently specified as part of the IETF Management Information Base (MIB) in RFC 1757, entitled “Remote Network Monitoring Management Information Base.” More recently, RMON Version 2 (sometimes referred to as “RMON2”) was specified in IETF RFC 2021. These standard documents are incorporated herein by reference.
RMON can be supported by hardware monitoring devices (known as “probes”) and/or by software agents embedded in network nodes and other elements. For example, Cisco's line of LAN switches includes software in each switch that can trap information as traffic flows through the switch and record the information in its MIB. RMON specifies nine kinds of information to be collected by probes and agents, including packets sent, bytes sent, packets dropped, statistics by host and by conversation between two sets of addresses, and certain kinds of events that have occurred. RMON information groups eight and nine are based on trapping or capturing specified types of packets, to provide network alarms and enable traffic decoding and analysis. RMON probes and agents are typically controlled by a management station, using SNMP commands. These SNMP commands are described, for example, in SNMP, SNMPv2 and RMON: Practical Network Management, by William Stallings (Second Edition, Addison Wesley, 1996), which is incorporated herein by reference.
Other types of network monitoring tools are also known in the art. For example, Network Associates (Santa Clara, Calif.) offer the “Sniffer” line of network analysis products. The capabilities of these products include packet capturing, whereby filters based on pattern matching and/or Internet Protocol (IP) addresses enable selected frames to be captured and displayed. Further details regarding these products are available at www.sniffer.com. Another tool that is commonly used in diagnosing IP-based routing is TraceRoute, which is described, for example, by Huitema, in Routing in the Internet (Prentice Hall, 1995), page 45, which is incorporated herein by reference. TraceRoute is used to determine a network path that an IP packet could traverse from a specific host to reach an intended destination, and to identify possible network problems in this context. It is available as an application in most operating systems that implement IP.
TraceRoute discovers intermediate hops traversed by a packet by adjusting the “Time to Live” (TTL) parameter in each of a sequence of IP packets. It uses the fact that at each hop as the packet passes through the network, the TTL is reduced by one, and an error message is sent by a router that receives an IP packet with a zero TTL. In each packet in the sequence sent from the host, the TTL parameter is incremented by one. TraceRoute monitors the error messages sent back from the routers in the network with respect to each of the packets in turn, and thus tracks the packets downstream progressively until the ultimate destination has been reached. When multiple paths are available in the network, however (as is the case in most large IP networks), there is no assurance that all of the packets in the sequence will follow the same path. In this context, the information provided by TraceRoute is of little use in end-to-end tracking of packets or in determining packet transmission delays over different hops along the route.
U.S. Pat. No. 5,812,529, whose disclosure is incorporated herein by reference, describes a system and method for acquiring network performance data, built around a “mission server,” which interfaces with clients to receive requests for “missions.” A typical mission includes operations such as transmission and reception of data packets among devices connected to segments of the network. The mission is performed and/or supported by “sentries,” typically software agents running on the network devices. The sentries carry out mission operations in response to commands from the mission server, and report back to the mission server on the mission results.
U.S. Pat. Nos. 5,838,919 and 5,881,237, whose disclosures are incorporated herein by reference, describe methods, systems and computer program products for testing of network performance using test scenarios that simulate actual communications traffic between network end-points. Specific test protocols are assigned to end-point nodes on the network. Typically, the nodes are paired, and one of the nodes in the pair communicates the protocol to the other, associated node. A console node sets up the test protocols, initiates their execution and receives data on the test performance from the end-point nodes.