This invention relates generally to data networks and more specifically to performance measurements on data networks.
The Internet is an example of a large and widely used data network. The Internet interconnects many computers often owned by different entities, creating the ability for widespread information exchange. The computers connected by the network are sometimes called “nodes”. The nodes might take the form of “routers” or “servers”.
A server is generally a node that runs an application. Often it provides information when requested. But the server could run programs on data or store data in response to user commands.
In contrast, a “router” is a computer that is connected to many other nodes and passes data messages in the direction of an intended destination computer. All nodes on the Internet communicate using Internet Protocol (IP). Under the Internet Protocol, messages are broken up into “datagrams” or “packets” of data. Each datagram follows a prescribed format. One of the prescribed fields is a destination address. Each node has an address assigned to it and the destination address allows a computer to be specified to receive a particular datagram of data. The router reads the destination addresses of datagrams. If a computer with that address is attached to the router, the message can be sent directly to that computer.
If the destination computer is not attached to a router that receives the datagram, the router can pass the datagram on to another router. Sometimes, though, a datagram never reaches its destination address. It is possible that the datagram contains an error and specifies a non-existent address. Or the server at the destination address might not be functioning. Or, the path to the destination address might be blocked. If there were no way to eliminate such datagrams from the network, the number of such datagrams would build up over time. Ultimately, all the routers on the network would do nothing but send datagrams that could not reach their destinations.
To avoid this, the Internet Protocol requires that each datagram contain a source address and field called “Time To Live” or TTL. The TTL field is a counter that is initially set to a value between 1 and 255. Each time the message is sent from one router to the next, the TTL field is reduced by 1. When the TTL field reaches 1, the message is not passed on to another router. Rather, that router creates a new message that has in its destination field the address of the source of the old message. This message contains data that signals to the source of the first message that the datagram did not receive the message because it took too long to find a path to the destination address.
In addition, each datagram has an identification number. In general, each source will increment the identification number field by one for each datagram that it sends. The identification number helps the original source computer identify which datagram was not received. It was intended to be used by transport layer protocols, however it is not used by any of existing implementations. As will be described in greater detail below, this field may also be used in a novel way to measure response time of a network.
Using the Internet Protocol, many computers, operated by different entities, can all communicate. However, the distributed nature of the Internet also creates special challenges when things do not work right.
A user attempting to exchange information with a particular computer over the Internet has to transmit messages through several “administrative domains.” An administrative domain represents a portion of the network managed by a particular entity. If the communication fails or takes too long, a user might have difficulty knowing where the problem resides.
At a high level, traditional Internet communication can be thought of as passing hrough five administrative domains. A consumer has a computer, such as a PC. The PC represents one administrative domain because it is under the control of the user. The PC is connected to the Internet through an access provider network. The access provider network is administered by an access provider, such as the local phone company or DSL provider. The access provider network enables communication with an Internet Service Provider (ISP). The ISP maintains an ISP network that provides a connection to the Internet. Within the Internet, routers pass messages to enable communication with servers. Many entities control individual routers in the internet, but the Internet as a whole can be thought of as one administrative zone. The server is generally under control of a single company or entity, which represents yeat a further administrative domain.
If a consumer experiences a problem communicating with a particular server, the consumer will often not know where the problem lies. However, the consumer often pays the access provider or ISP for service. When the user experiences problems, the consumer will often call a call center run by the access provider or ISP.
It would be very desirable if the call center operator could provide fast and accurate information about the source of the problem. If the problem resides in the administrative zone controlled by the entity that runs the call center, it would be desirable to quickly identify the problem and make arrangements to have it repaired. But, where the problem resides elsewhere, it would be desirable to be able to identify that the problem resides in a different administrative domain. Timely information would reduce the burden on the company running the call center and also reduce customer frustration.
Some tools are available to diagnose a network problem. “PING” is a network tool that is often installed on networked computers. The PING tool sends a message to a particular computer and determines whether a response is received. This tool can verify that a connection exists. But, if no connection exists, the tool can not provide an indication of the source of the problem. Nor will the tool be able to identify the source of a bottleneck or similar problem that slows, but does not block, communication.
Traceroute is another such tool. Traceroute uses the time to live field in IP datagrams. A source running traceroute will send multiple datagrams to a particular destination. In the first datagram, the time to live field will be set to 1. This causes the datagram to expire at the first router in the path. That router sends back a “datagram expired” message. The message includes in its header the address of the router that sent it, which tells the first router in the path. The source running traceroute sends another datagram, with the time to live field incremented by one. This datagram will expire at the next router in the path and a “datagram expired” message will be generated by that router. As successive datagrams are sent, with successively higher values of time to live, successive routers in the path respond, providing their IP addresses to the source. At some time to live value, the datagram will reach the destination computer before expiring. That computer will respond to the trace route message with a “Destination Unreachable/Port Unreachable” message. In this way, the source can determine the address of every router in the path to the destination. However, traceroute only provides the path to a particular computer. It does not provide any indication of whether performance is being hindered by excessive traffic at one of the nodes in the path.
Performance information might be obtained through the use of SNMP information stored in the routers in the path. Routers generally store information about the messages they pass. Statistics about the volume of traffic might, for example, reveal a particular router is overloaded and is therefore the bottleneck. However, a problem with employing SNMP tools is that they are available only to users who have administrative privileges on the router being tested. Because the routers are usually in an administrative zone operated by an entity other than the one that operates the call center, it is unlikely that the appropriate access to the router will be available to use these tools.