1. Field of the Invention
The present invention relates to a method and apparatus for monitoring data transmission through a communications network while the communications network is in service. More particularly, the present invention relates to a network monitoring system having endpoint probes that transmit and receive message data over a packetized switching network to exchange information allowing the probes to measure network performance metrics, such as network availability, data delivery ratio and round trip delay in the communications network.
2. Description of the Related Art
Both from an end user and service provider standpoint, there is an increasing need to accurately measure operational performance of data communications networks. Communications networks, especially packetized data networks, are currently utilized in various applications for transmission and reception of data between parties at different locations. A typical data transmission system includes a plurality of end user sites and a data packet switching network, which resides between the sites to facilitate communications. Each site is connected to the switching network via an access channel (i.e., a channel connecting a site to a communications system), wherein transmission circuits, preferably virtual circuits, establish paths between the sites through the access channel and the switching network.
Packetized data networks typically format data into packets for transmission from one site to another. In particular, the data is partitioned into separate packets at a transmission site, wherein the packets usually include headers containing information relating to packet data and routing. The packets are transmitted to a destination site in accordance with any of several conventional data transmission protocols known in the art (e.g., Asynchronous Transfer Mode (ATM), Frame Relay, High Level Data Link Control (HDLC), X.25, IP tunneling, etc.), by which the transmitted data is restored from the packets received at the destination site.
Packetized data communications are especially appealing for common carrier or time-shared switching systems, since a packet transmission path or circuit is unavailable only during the time when a packet utilizes the circuit for transmission to the destination site, thereby permitting other users to utilize that same circuit when the circuit becomes available (i.e., during intervening periods between packet transmissions). The access channel and each individual transmission circuit typically have a maximum data carrying capacity or bandwidth that is shared among the various users of the network. The access channel utilization is typically measured as an aggregate of the individual circuit utilizations and has a fixed bandwidth, while the individual circuits may be utilized by several users wherein each user may utilize an allocated portion of the circuit.
Typically, when a party needs to send and receive data over distances, the party (end user) enters into a service contract with a service provider to provide access to a data communications network. Depending on an individual end user's needs, the service contract may include provisions that guarantee certain minimum performance requirements that the service provider must meet. For example, if the end user expects to send and receive a certain amount of data on a regular basis, the end user may want the service provider to guarantee that a certain minimum bandwidth will be available to the end user at all times. Certain end user applications are sensitive to transmission delays and/or the loss of data within the network (i.e., failure to successfully deliver data packet(s) to their destination). Specifically, while loss of data packets can generally be detected by end users (via information provided in the data transmission protocol), and lost packets can be retransmitted, certain applications cannot function when the percentage of lost data exceeds a given level. Thus, the end user may want the service provider to guarantee that the average or minimum ratio of data units delivered by the network to data units offered to the network at the far-end is above a certain percentage and/or that the average or maximum transmission delays will not exceed a certain duration.
From a service provider's perspective, it would be competitively advantageous to be able to demonstrate to potential and existing end users that the service provider is capable of meeting and does meet such network performance metrics. Thus, the capability to provide analysis of network system performance at the service level, i.e., service level analysis (SLA), particularly in the context of network systems that share bandwidth between sites, would be advantageous from both an end user and service provider standpoint.
Various systems have been proposed which provide some measure of network system performance. Specifically, a number of techniques for measuring round trip delay (RTD) of data transmitted between two sites is known. For example, U.S. Pat. No. 5,521,907 to Ennis, Jr. et al., the disclosure of which is incorporated herein by reference in its entirety, discloses a system for passively measuring the round trip delay of data messages sent between two sites. More specifically, a console triggers probes at two sites to store data packets being sent between the two sites. The probes generate unique packet signatures based on the data in the packets, and time stamp the signatures. By matching signatures from the two probes and comparing the corresponding timestamp values, the console can determine the round trip delay between the sites. This technique requires the storage, transmission and processing of a significant amount of data, particularly if implemented to periodically monitor all virtual circuits existing between a set of sites. That is, the passive probes cannot individually determine round trip delay, and each probe must store and transmit a substantial amount of data to the console which is required to correlate signature and timestamp data from different sites.
U.S. Pat. No. 5,450,394 to Gruber et al., the disclosure of which is incorporated herein by reference in its entirety, discloses a technique for determining round trip delay in which measurement cells containing timestamp information are sent between two nodes. A first node transmits a measurement cell with a first time stamp to a second node, and the second node replies with a measurement cell containing additional time stamp information which can be used by the first node to determine the round trip delay. Because the technique relies, in part, on timestamps already present in PM OAM (performance management operations, administration and maintenance) ATM cells, the technique is specific to the ATM protocol and cannot readily be adapted to other data protocols or be expanded to monitor other service level performance metrics. Further, the technique does not allow both nodes to measure the round trip delay of the same sequence of cells (i.e., either only one of the two nodes measures round trip delay or the two node measure delays of different transmitted cell sequences).
Further, while it is possible for individual switches in existing network systems to indicate how many packets of data have been dropped by the switch, there are no known systems capable of measuring a rate of successful (or unsuccessful) data delivery on a service level, e.g., over a particular virtual circuit or to a particular end user.
The problem of providing service level analysis of network performance is complicated by the fact that many switching networks comprise interworked systems using plural, different data transmission protocols (e.g., an ATM switching network interworked with a Frame Relay switching network), thereby forming a so-called "interworked" network. Such interworked networks are becoming more common, and present an additional challenge to designing a service level analysis tool that employs a standard message structure and messaging protocol useful for communicating between any two sites. Existing systems relying on inter-site or inter-probe messages to assess system performance are generally incapable of operating across interworked networks.
Accordingly, there remains a need for a system capable of providing service level analysis (SLA) of communications network performance, especially packetized, interworked data networks, to provide end users and service providers information relating to performance metrics, such as round trip delay, data delivery ratio, and other metrics, such as the percentage of time the network is available.