Packet switched networks typically include a plurality of network devices (e.g., end user terminals, computers, routers and switches) interconnected by transmission links. Such networks are commonly used today for data-oriented applications such as delivering email and web content. Multimedia and real-time applications (e.g., streaming audio, video on demand, and voice applications) running on the same packet switched network, though less common than the data-oriented applications, are gaining acceptance. Packet switched networks are different from the circuit switched networks that have traditionally been used for telephone communication. In a circuit switched network a pair of endpoints communicate by establishing a connection, which behaves as if endpoints are connected to the same wire. In packet switched networks, however, many participants compete for the same network resources (i.e., routers, switches, and links).
The well-known ISO-OSI seven-layer reference model (International Standards Organization-Open Systems Interconnect) was developed to help describe computer networks. Two important layers of this model are used throughout this document. Layer 2, the data-link layer, refers to communication within a LAN, such as what Ethernet provides. Layer 3, the network layer, refers to networks that may span multiple LANs, such as the Internet Protocol (IP).
We can think of devices that operate primarily at layer 2 as layer-2 devices. For example, the primary function of an Ethernet switch is to forward Ethernet traffic in units called frames to the port on the path towards the destination device. Thus, a switch is considered a layer-2 device. It should be noted that switches often have management agents that operate at layer 7 (the application layer) and require a layer-3 component to communicate with the management station. Despite having a component that operates at layer 3, a switch is still considered a layer-2 device because its primary function (namely forwarding Ethernet frames) is applied at layer 2.
Similarly, layer-3 devices are those devices that operate primarily at layer 3. An example of a layer-3 device is an IP router. The primary function of such a device is to process IP packets and forward them to the interface towards the destination. Routers require hardware that creates a layer-2 (and layer-1, the physical layer) frame to send the packet to the neighboring device. Despite the existence of such hardware, the router is considered a layer-3 device because the primary function is to process layer-3 packets.
A recent trend has been to combine functionality of a switch and a router in a single box. Such devices, called layer-3 switches, have characteristics of both layer-2 and layer-3 devices. A layer-3 switch can be treated as two separate devices, a layer-2 switch and a layer-3 router, connected by its internal backbone bus.
A subnet is an important concept in a network, such as an IP network. A subnet can be defined as a set of network addresses (or the devices using those addresses) that can communicate directly at layer-3. That is, the physical path between the addresses may contain any number of layer-2 devices (such as switches), but no other layer-3 devices. A router is a device that sends traffic between subnets.
Subnets can also be defined in terms of IP addresses. An IP address consists of 32 bits (or 4 octets, represented as the decimal value of each octet separated by periods). The example IP address, 192.168.3.106, corresponds to the binary representation shown in the first row of Table 1.
An IP address can be divided into two parts: the subnet address and host address, where the first (most significant) N bits of the address are the subnet address and the remaining bits are the host address. All addresses belonging to the same subnet have the same subnet address, and hosts within the subnet have a different host address. Thus, a subnet can be defined as the combination of a subnet address and N, the number of significant bits used in the subnet address. It is convenient to construct a subnet mask (or network mask) as a bit field where the first bits N are set to one and the remaining bits are set to zero. For example, similar to the IP address representation, a subnet mask of 23 bits is equivalent to 255.255.254.0. Thus, an address belongs to a subnet if and only if the result of applying the network mask to the address (i.e., the logical AND operation is applied between the binary representations of the address and the mask) is equal to the subnet address.
An important address in an IP subnet is the broadcast address. Packets sent to the broadcast address are sent to every host in the subnet. The broadcast address is, by definition, the address in the subnet with the largest possible host address (i.e., every bit in the host address is set to 1). Table 1 shows an IP address, subnet mask, subnet address, host address, and broadcast address for the example host and subnet.
TABLE 1Subnet Address ExampleDotted DecimalBinaryIP Address192.168.3.10611000000 10101000 00000011 01101010Subnet Mask255.255.254.011111111 11111111 11111110 00000000Subnet192.168.2.011000000 10101000 00000010 00000000AddressHost0.0.1.10600000000 00000000 00000001 01101010AddressBroadcast192.168.3.25511000000 10101000 00000011 11111111Address
Not long ago, the standard network layout used a separate switched network for each department and geographical location (e.g., a floor and wing of a building) and several layer-3 routers between the switched networks. The recent popularity of Virtual LANs (VLANs) has resulted in an increase in the size of fast switched networks and a decrease in the dependency of routers. Today, it is common to use a single switched network for an entire building or campus with a single edge-router for each switched network. This shift underscores the importance of the layer-2 topology in enterprise networks.
FIG. 1 shows an example of a simple layer-3 network. The network consists of three hosts (H1, H17, and H19), three routers (R3, R7, and R11), one firewall (FW20), three subnets (N2, N16, and N18), the addresses used on the routers (e.g., A4, A9, A10) and several communication links (shown as lines connecting network elements). The figure also shows the route tables for each of the routers. The route tables each have three columns (Subnet, Address, and Type). A route table is indexed by the Subnet field—that is, when the router needs to lookup a route in its route table for a packet, it finds the entry whose Subnet field contains the destination address in the packet header. The second column, address is either (1) the address of the next router along the path toward the destination, or (2) the address belonging the router itself on the same subnet as the destination, if it is the last router along the path. The third column indicates which type of address is used—specifically, the type is indirect if the address belongs to a neighboring router and direct if the subnet is directly connected to the router. Note that some direct route entries have been omitted to simplify the example.
To illustrate an example of how routers operate, consider the case where H1 sends a message to H19. Each host is configured to send traffic to its nearest router, called its default router (or default gateway); in this case, H1's default router is R3. Every device (host, router, etc.) is only allowed to send packets to devices on the same subnet; to send packets to devices on other subnets, the packet must go to a router. In this case, H1 needs to send the packet to its default router, R3, because H1 is on N2 and H19 is on N18. Upon receiving the packet, R3 looks up the destination address, H19 in its route table. It finds that H19 belongs to subnet N18, corresponding to the third entry. Based on that route entry, R3 sends the packet toward A8, which belongs to R7. When R7 looks up the destination address, H19, in its route table, it finds that the destination belongs to a subnet, N18, that is directly connected to the router. Thus, R7 can send the packet directly to H19.
When a router encounters a packet whose destination address does not match any entry its route table, it sends the packet to the default address. For example, in FIG. 1, the route table of R7 does contain no entry for subnet N16. If R7 receives a packet destined for N16, it sends the packet to A12 by default.
Informally, the path between a pair of devices in a network consists of the intermediate devices and links traversed by the packets sent between the pair. In the example above, routers R3 and R7 are on the path from H1 to H19.
FIG. 2 shows an example of a layer-2 network based on subnet N2 of FIG. 1. It consists of four hosts (H1, H20, H21, and H22), one router (R3), four switches (S30-S33), the ports on the switches (I60-I72), and several communication links (shown as lines connecting network elements). The figure also shows the Forward Table for each switch. The Forward Table has two columns, address and port, which map the address to the port along the path toward the host using the address.
As an example of how typical switches operate, consider the first hop of the path from H1 to H19 above; the first layer-3 hop is from H1 to R3 on subnet N2. First, H1 sends the frame using R3 (more precisely, R3's physical address) as the destination address on H1's only link (i.e., to I60 on S30). Upon receiving the frame, S30 looks up the destination address, R3, in its Forward Table, which indicates that I61 should be used to get to the destination. Thus, S30 sends the frame through I61, which connects to I63 on S31. Next, S31 sends the frame out to I64 as indicated in its Forward Table entry for R3. The frame then arrives at S32, whose Forward Table's entry indicates that S32 should forward the frame on port I67. Finally, the frame arrives at R3 because the router is connected to I67. It should be noted that other switched layer-2 network technologies (e.g., asynchronous transfer mode (ATM), token ring) operate differently, but still fit into this framework.
As data traverses a network, each packet experiences delay at each of the network devices and links along the path. Delays at devices are based primarily on the state of switches and routers at the time packets are presented (e.g., if the router has a long queue, the packet may sit at the router until all the data ahead of it in the queue is transmitted). Delays due to the links are fixed and depend on (1) the time to send the signal over long distances and (2) the bandwidth of the link (i.e., the maximum transfer rate). Similarly, each packet is subject to being discarded along the path for a variety of reasons, including transmission errors (e.g., due to line noise) and the state of network devices (e.g., a full queue).
Emerging applications for use on present and proposed future data networks include so-called Voice Over IP (VoIP) applications-and other multimedia applications-that permit data networks carrying computer and other traditional forms of data to also carry coded voice signals using standard Internet Protocol (or other data protocol) techniques. VoIP applications are those for which voice communications are carried over an IP network for at least some of their transit between one or more calling stations and one or more called stations. Though VoIP applications promise increased network efficiencies and lowered cost for voice calls, use of such VoIP applications has thus far been relatively limited because existing and proposed networks are characterized by performance characteristics, including packet loss and packet delay, which, while tolerable for most data applications, give rise to user-perceived impairments that compare unfavorably with traditional voice communications—e.g., over the public switched telephone network (PSTN). See, for example, a paper by S. Pracht and D. Hardman, entitled Voice Quality in Converged Telephony and IP Networks, January 2001, available from Cisco World magazine.
Recent industry trends show that delivery of multimedia content over data networks has many benefits for a wide range of applications. A significant challenge to the widespread use of such multimedia applications is ensuring the availability of a minimum quality of service (QoS), especially in networks using IP, a protocol that generally provides only best effort delivery of packets. IP does have a notion of Type of Service (TOS) that allows hosts to classify their traffic for different QoS properties (see also DiffServ, below), but this mechanism is seldom utilized in practice.
VoIP applications constitute a further challenge for data networks since they involve delivery of voice and data content, each having different QoS requirements and sensitivities. While applications delivering voice packets are especially sensitive to delay, jitter, and packet loss, many data application will perform satisfactorily under the same conditions of delay or jitter. For example, in transferring a large file, the user is only concerned with the total time to send the file (e.g., it is acceptable to have periods where no data is sent so long as the total time to transfer the entire file is not affected). It is not acceptable, however, for voice traffic to be silent for seconds while the speaker is trying to talk. Hence, a data network that performs satisfactorily for some applications does not necessarily lend itself to a successful VoIP implementation.
Prior art on discovering layer-3 topology includes academic papers and tools. Several papers have been published that automatically discover a map of the layer-3 topology but provide limited information about paths between devices in the network. One paper (R. Siamwalla, R. Sharma, and S. Keshav, “Discovering Internet topology,” 1999) presents and compares ping-, traceroute-, and Domain Name Service (DNS)-based techniques to obtain the layer-3 topology. Ping is a protocol where one host sends a particular Internet Control Message Protocol (ICMP) message (an echo request) to another host, which in turn replies with another ICMP message (an echo reply). Traceroute is a program that traces the sequence of routers along a path. It does so by sending an IP packet with a small value in the Time To Live (TTL) field in the IP header. Each router decrements the TTL field by one and it is required to send an ICMP to the sender if the TTL value reaches 0. Traceroute uses the source address of the ICMP packet to determine which router is N hops away (where N is the value set in the TTL field). By repeating this process for various values of TTL (e.g., starting with 0 and counting up until it reaches the destination address), it learns of all the routers along the path.
Other examples of prior network topology discovery at layer 3 are described in, for example, B. Huffaker, M. Fomenkov, D. Moore, and k. c. claffy, “Macroscopic Analyses of the Infrastructure: Measurement and Visualization of Internet Connectivity and Performance,” in Proc. of PAM2001-A Workshop on Passive and Active Measurements, (Amsterdam, Netherlands), Apr. 23-24, 2001; R. Govindan and H. Tangmunarunkit, “Heuristics for Internet Map Discovery,” in Proc. of the 2000 IEEE Computer and Communications Societies Conf. on Computer Communications (INFOCOM-00), (Los Alamitos, Calif.), IEEE, Mar. 26-30, 2000; H. Burch and B. Cheswick, “Mapping the Internet,” IEEE Computer, vol. 32, pp. 97-98, April 1999. These papers mainly focus on mapping the topology of the Internet backbone rather than that of an enterprise network.
Among the tools that discover layer-3 topology, Skitter, dynamically discovers and displays the Internet topology as well as performance measurements. Skitter uses a variation of traceroute which sends ICMP probe instead of User Datagram Protocol (UDP) probes. Each probe runs from a set of geographically distributed servers. Skitter has several different views of the topology based on IP address, IP connectivity, geographic location, and performance. It does not attempt, however, to find paths between arbitrary endpoints. Another tool, Mercator, adds a technique to identify where IP addresses from separate paths belong to the same router. It finds paths from a single centralized location. Finally, another tool mapped nearly 100,000 networks in an attempt to visualize the interconnections in the Internet. Their approach used a combination of Border Gateway Protocol (BGP) routing tables, which can be obtained directly from routers, and traceroute. See, for example, Y. Rekhter and T. Li, “A Border Gateway Protocol 4 (BGP-4),” March 1995, RFC 1771.
Simple Network Management Protocol (SNMP) is an industry standard protocol for communicating management information to and from devices on a network (e.g., routers, switches, printers, etc.). See, for example, J. Case, M. Fedor, M. Schoffstall, and J. Davin, “A Simple Network Management Protocol (SNMP),” May 1990, RFC 1157 or W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 and 2. Reading, Mass.: Addison-Wesley, 3rd ed., January 1999.
Nearly all new network-attached products for sale to businesses include an SNMP agent (i.e., a software module on the devices for processing SNMP requests). SNMP is a lightweight protocol that allows SNMP clients (e.g., a management tool) to obtain information from or configure devices with an SNMP agent. The meaning of the information that SNMP carries is specified by the Management Information Base (MIB). See, for example, M. Rose and K. McCloghrie, “Concise MIB Definitions,” March 1991, RFC 1212; K. McCloghrie and M. Rose, “Management Information Base for Network Management of TCP/IP-Based Internets: MIB-II,” March 1991, RFC 1213. MIBs are organized in a hierarchical tree where different organizations own separate branches of the tree. For example, the MIB-II branch is controlled by the Internet Engineering Task Force (IETF), a standard body, and any company can have its own branch under the enterprises node.
SNMP-based approaches for discovering layer-3 devices have been demonstrated in commercial tools. For example, SolarWinds, a network management tool, includes a component for discovering devices on the network using ping, DNS queries, and SNMP queries. The topology discovery process performs a breadth first search from a seed router to the routers given in another router's route table.
Previous SNMP-based approaches to find the layer-3 path between arbitrary hosts have been demonstrated to work when SNMP is available on all intermediate routers and the IP address of the first router is known. One such approach is described in D. Zeltserman and B. Puoplo, Building Network Management Tools with Tcl/Tk, Upper Saddle River, N.J.: Prentice Hall, January 1998. It starts from a given router, finds the routing entry towards the destination, uses its next-hop address field to find the next router, and iterates until the destination is reached. This approach fails when any router in the path is inaccessible. Because the routing information is collected at run-time it has the advantage that the routes are current. But, such an approach is inefficient for finding several routers at once because some route tables take a long time to retrieve (we have observed some that take as long as 15 minutes to retrieve). The authors suggest a certain improvement such that each hop can be reduced to 33 or fewer lookups by utilizing the table index to check the destination address applying each possible netmask until a suitable entry is found.
A few commercial tools offer products claiming to provide layer-3 topology discovery. A few well-known examples include HP OpenView 6.2, Computer Associates Unicenter Network and Systems Management 3.0, and IBM's Tivoli. Since the approaches used by each tool are proprietary, the details of each tool cannot be presented here. Only a few tools claim to provide information about layer-3 paths in a network. For example, see Peregrine Systems, Inc., “InfraTools Network Discovery,”; Cisco, “CiscoWorks2000,”.
Limited literature is available on layer-2 topology discovery. An approach to generate the layer-2 topology between switches was presented in a paper, Y. Breitbart, et al., “Topology discovery in heterogeneous IP networks,” in Proc. of the 2000 IEEE Computer and Communications Societies Conf. on Computer Communications (INFOCOM-00), (Los Alamitos, Calif.), pp. 265-274, Mar. 26-30, 2000 and improved upon in another, B. Lowekamp, D. R. O'Hallaron, and T. R. Gross, “Topology Discovery for Large Ethernet Networks,” in ACM SIGCOMM 2001, (San Diego, Calif.), pp. 237-248, Aug. 27-31, 2001. This approach operates by processing the forwarding tables obtained from each switch via SNMP.
Some switch vendors have produced commercial tools that use proprietary MIB extensions to generate the layer-2 topology in a network consisting only of their products. See, for example, Hewlett-Packard Co., HP Toptools 5.5 User Guide, 2001. A few commercial tools have recently added claims to provide layer-2 topology discovery in heterogeneous networks. The techniques used by these tools are proprietary. See, for example, Peregrine (as above) and Hewlett-Packard Co., “Discovering and Mapping Level 2 Devices.”
The prior work presented above for layer-2 topology discovery has certain limitations. Only one other approach finds a path between arbitrary hosts, but: (a) it cannot automatically obtain the first router in a path, (b) the path stops at the first non-SNMP-enabled device in the path, and (c) the path analysis is done on the live network, which is inefficient when a large number of paths are needed. The layer-2 topology algorithms described above perform poorly (e.g., can fail to produce any correct links) when a single forward entry is missing or incorrect. Furthermore, the approaches have not been demonstrated to work on networks using VLANs. No previous techniques have been presented to relate or to combine layer-2 and layer-3 paths.
Several mechanisms are currently available to manage the allocation of network resources among network users in efforts to optimize QoS in the network. In one example, an emerging Differentiated Services (DiffServ) approach allows a communications provider or a network user to mark packets with different settings to associate them with different grades of network service. See, for example, S. Blake et al, IETF RFC 2475, “An Architecture for Differentiated Services,” December 1998; and W. Stallings, “Differentiated Services,” Communications Systems Design, vol. 6, no. 2, February 2000. Such differentiated services allow the network to allocate network resources among classes of packets and, ultimately, among network users. In addition, some devices permit control over the rate that traffic is sent across portions of the network, thus permitting communications providers to control the offered load applied to a network.
Two simple techniques for network management, ping and traceroute, are described above. Ping can be used to determine if a network end station can be reached and is operational. Traceroute techniques can determine the layer-3 hop-by-hop path and round-trip time to a network end station. Other proposed techniques actively probe a network by transmitting additional packets into the network and measuring the end-to-end delay and packet loss rate across these networks.
These approaches suffer several shortcomings when applied to large-scale network performance management. First, ping can only test a connection from a testing point to a remote location. To test paths between network ingress and egress points, a network operator must perform ping operations between all edges of the network of the network. While traceroute can determine the path being taken by packets across the network, it cannot distinguish between packet loss and non-responding systems such as firewalls and the like. Likewise, it can only compute the round-trip delay (including system's processing delay).
Prior attempts to identify data networks that are suitable for VoIP applications and techniques for optimizing existing networks for VoIP applications have included those used with networks carrying traditional data applications. However, such prior test and measurement techniques often suffer from limitations in recognizing network characteristics that prove of great importance to voice users. Thus, as noted above, suitable packet delay characteristics (as well as jitter and packet loss) prove to be of special importance in successful implementation of VoIP applications. Moreover, most voice traffic over data networks (as in traditional voice networks) involves two-way communications (or more, e.g., for multiparty conferencing) over respective data links, with delay in each link being important to perceived call quality.
Because many present and proposed VoIP applications are intended for use over private corporate, government or other institutional networks, and because such networks are also required to carry a variety of other traffic, at least some of which has an assigned priority, it often proves necessary to design and operate networks to be used for VoIP applications with such priorities clearly in mind. Thus, it is important to measure existing and proposed traffic flows in view of such priorities and in view of inherent requirements of VoIP applications.
Because many corporate and other private networks include a large number of operational nodes (computers, user data terminals, voice terminals, routers, switches, etc.) each interconnected with one or more other nodes over a variety of data links, the complexity of such networks often poses severe planning and operational difficulties. Such difficulties are compounded by the variability of traffic, including VoIP traffic, especially in times of network overload or failure. Increases in steady state and peak traffic demands, and newly emerging traffic patterns or actual or potential performance bottlenecks are often difficult to anticipate or quickly recognize using present network monitoring techniques.
Traffic matrices between sources and destinations in the network are often used for tracking network traffic patterns. A traffic matrix has the source as one axis, the destination as the second axis, and a measure of traffic during some interval of time (e.g., packets per second or bytes per second) as the entry in the matrix. Using a set of such matrices from a set of appropriate intervals, a communications provider can track trends in load offered to its network, thus providing a basic tool for network engineering. One existing network monitoring system measures offered packet load and can record information to create a traffic matrix, but cannot track actual network performance. This system tracks sequences of packets between source and destination addresses as a router processes them and reports this information to a central system. By combining such records from several packet switches, it is possible to compute the number of packets and the number of bytes of packet traffic between ingress and egress points of a network. This tool, however, does not provide a means for computing network loss or delay during specific intervals, nor does it provide means for sectionalizing such performance metrics.
A network testing tool known as Chariot marketed by NetIQ Corp. provides predictive information relating to impact of introducing a new application on a data network. This and other products of NetIQ are described generally in their publication Managing the Performance of Networked Applications. General descriptive materials are also available at that web site relating to a Chariot Voice over IP module available from NetIQ.
Commercial tools for network performance monitoring and management currently available include Hewlett-Packard's HP Openview, Lucent's VirtualSuite, Patrol DashBoard, described at bmcsoftware, “PATROL DashBoard,” Omegon's NetAlly described in “NetAlly White Paper,” the Felix project from Telcordia Technologies described in C. Huitema and M. W. Garrett, “Project Felix: Independent monitoring for network survivability,” and open source MRTG. Such commercial tools provide detailed network statistics, but are limited in their ability to export the data to other tools for cooperative analysis purposes.
Tools for testing performance of multimedia applications (specifically, VoIP) include the above-cited NetAlly and Chariot tools, as well as Hammer described in Empirix, “Test and Monitoring Solutions for Web, Voice, and Network Applications,”; and VoIP Explorer. While these tools differ in the way they inject voice traffic, they collect similar end-to-end measurements including delay, jitter, and packet loss.
Other tools that provide some testing functionality for assessing networks for possible VoIP applications include those from Agilent Technologies. Agilent Technologies's suite of tools includes three main components: Voice Quality Tester (VQT), IP Telephony Analyzer, and IP Telephony Reporter. Voice Quality Tester measures voice quality objectively, without having human listeners. This system supports one-way and round-trip delay measurements, echo, and clarity (a measure of voice quality). IP Telephony Analyzer captures RTP packets and calculates various performance metrics, such as packet loss, delay, and jitter for each RTP stream. Additionally, for each connection and protocol, it collects statistics on the number of frames, bytes, and frame errors, and the utilization. IP Telephony Reporter merges the call quality statistics provided by VQT and the packet network statistics provided by the IP Telephony Analyzer by importing result files from both of the components. Agilent's suite measures the impact of IP telephony equipment on voice quality rather than the impact of the data network on quality.
Cisco Systems provides a solution described in “Cisco VoIP Readiness Net Audit,” that uses proprietary SNMP-based tools for data collection from network devices. The goal of this solution is to assess the general health of the network. The service focuses on performance analysis of routers and switches and delivers an executive report describing the overall network performance and VoIP readiness. It does not integrate voice quality statistics with network device statistics.
Each of the prior tools mentioned above proves useful in particular circumstances to provide a part of the required set of tools required to assess a network for multimedia application readiness. None of these prior tools, however, fully integrates voice quality metrics with statistics for network devices on the voice path to the degree desired for the multimedia applications of current and future importance. Moreover making selections from the variety of existing tools to accomplish the desired high degree of integration is non-trivial since each tool has different interfaces, data formats, and limited data import/export support. Another major obstacle for integration of disparate tools is that the granularity of time measurements tends to be different for each tool. Few commercial tools provide fine time granularity measurements (i.e., monitoring on the order of seconds). Furthermore, most of these tools require the use of a graphical user interface (GUI), which would require extensive manual intervention to compose sophisticated tests.
Thus, above-cited prior art techniques, while useful in particular circumstances, suffer from one or more limitations relating to completeness of monitoring or analysis of network entity performance, integration between network measurement, analysis and visualization, or in ease of use in connection with a variety of multimedia and other non-traditional applications.