In the field of Internet Protocol (IP)/Multi-Protocol Label Switching (MPLS) communications, it is known to verify whether two data network nodes can reach each other by employing functionality provided by a “ping” command and a “traceroute” command. The implementation of the ping and traceroute commands functionality specification are described in Internet Engineering Task Force Request For Comments (RFC) 1147 which is incorporated herein by reference. A short summary of the relevant concepts of the ping and traceroute commands follows:
Persons of ordinary skill in the art would understand that data communications networks conveying data packets in accordance with the IP protocol and/or the MPLS protocol do so in accordance with a store and forward discipline. At each data network node in a communications network, a packet is received via an input port, stored, an output port determined in real-time, and the packet is forwarded over the determined output port. Real-time port determination is known as routing functionality and is performed by a router network element. The real-time determination of the output port is made dependent on a variety of factors including: destination addressing information held in packet headers, forwarding class associativity, packet traffic differentiation, operational states of inter-connecting links between network nodes, transport bandwidth availability over links, packet processing bandwidth availability at data network nodes in the path, etc.
Persons of ordinary skill in the art would understand that data communications networks conveying data packets in accordance with the IP protocol, do so in accordance with a best-effort packet transport discipline. The best-effort discipline does not guarantee that data packets will reach their destinations, does not guarantee bounded packet arrival latencies, does not guarantee bounded packet arrival jitter, etc. In fact packets specifying the same source network address and the same destination network address do not necessarily follow the same transport path in a data communications network, which is known in the art as loose source routing.
The real-time output port determination described above may lead to situations in which packet transport loops are established. Each IP packet carries a Time-To-Live (TTL) specification in its header, which is an integer header field value initially set by a source data network node sending the packet (or a gateway at an edge between a customer network and a service provider network) and decremented by each data transport node forwarding the packet. When the TTL value reaches zero (0), the packet is discarded.
Although simple, this approach puts a lot of pressure on IP network design to ensure that only a small number of data transport nodes, and therefore interconnecting links, are traversed between a source data network node and a destination data network node. Physical implementations of interconnecting links varies and may include additional data/packet transport protocols—therefore from the point of view of connectivity verification, the data communications network infrastructure between two interfaces on two corresponding data transport nodes is referred to as a “hop” to make an abstraction thereof.
As mentioned herein above, the best-effort packet transport discipline does not guarantee bound packet arrival latencies. Latency is the amount of time it takes for a packet to traverse a communications network from its source data network node to its destination data network node. Latency is typically measured in milliseconds and includes physical data transport delays associated with physically conveyance of packets over physical interconnecting links, as well packet processing delays incurred by packets while being stored at transport network nodes, in a transport path between the source network node and the destination network node, while pending determination of output ports.
As mentioned herein above, the best-effort packet transport discipline does not guarantee a bound packet arrival jitter. Jitter is a measure of the variation of packet inter-arrival delays, and relates to a measure of the standard deviation of a group of delays incurred by a group of individual data packets typically associated with a data stream used in provisioning a data service.
The service provisioning, which is beyond the scope of the present description, is dependent on the resultant Quality-of-Service provided. Quality-of-Service is a combination of bandwidth, arrival delay, and jitter specifications for a particular data service provisioned end-to-end over a given interconnecting communications network infrastructure.
A person skilled in the art would understand that the MPLS transport protocol has been developed in order to provide high Quality-of-Service packet transport. Although, delays associated with physical packet propagation over physical interconnecting links can only be reduced to a certain extent, the MPLS technology provides: bandwidth reservation on the interconnecting links to ensure a resource availability, strict (pre-specified) routing/transport path to minimize packet processing delays along the path, and consolidated multi-transport layer switching minimizing switching delays at switching network nodes in the path. Packets having the same source network address and the same destination network address may follow different transport paths dependent on a Service Level Agreement (SLA) specification for each packet.
It is the adherence to a service level agreement in an MPLS environment, and the need to adhere to a service level agreement specification in a best-effort IP environment that is being addressed in the present description.
The implementation of ping and traceroute functionality includes the return conveyance of at least one individual echo return Internet Control Message Protocol (ICMP) packet, a packet probe, in a data communication network between a source network node and a destination network node to verify connectivity therebetween.
The extent to which connectivity is verified by ping probe packets relates to reachability, see FIG. 1. Ping probe packets carry a TTL value, and therefore reachability includes: an assessment as to whether there is at least one bound sequence of interconnecting links which can be traversed by a packet conveyed between the source network node and the destination network node before the expiration of the TTL. It is emphasized that each ping probe packet tests connectivity between a pair of pre-specified source and destination network nodes.
Besides testing reachability, each ping probe packet is also stamped with a time stamp value corresponding to the time at which the ping probe packet was issued by the source network node, enabling the calculation the aggregate return transport delay upon the return of the ping probe packet at the source network node. In sending a group of ping probe packets, the corresponding group of aggregate return transport delays are used to determine: minimum delay, maximum delay, average delay (in milliseconds), and jitter. The determined minimum delay, maximum delay, average delay, and jitter are referred to as packet transport statistics.
The extent of connectivity verification performed by employing traceroute packets, as they are known, relates network node discovery in a path between a source to a destination network node, FIG. 2. Implementing traceroute functionality employs groups of ICMP echo return packets directed towards the destination network node and bearing increasing TTL values. Traceroute packets are returned to the source network node when the TTL value is decremented to zero, therefore the use of increasing TTL values in sending the traceroute packets discovering intermediary transport network nodes incrementally further along a path between the source network node and the destination node.
Making reference to FIG. 3, for a source routed Label Switched Path (LSP) pre-established path, physical network nodes incrementally further along the LSP transport path may not return traceroute packets as the traceroute packets are encapsulated while in transport through the LSP, the TTL value only being decremented at the distal end of the LSP which does return traceroute packets. Traceroute packets are of course returned by network nodes beyond the distal end of the LSP.
In a best-effort IP environment, it cannot be guaranteed that all traceroute packets are routed the same as packet processing conditions change dynamically at network nodes between the source and the destination network nodes. A degree of stability in a communications network is expected, although not guaranteed, which when traceroute packets are sent in a relatively rapid succession, results in the group of traceroute packets following substantially the same transport path.
Information held in returned traceroute packets is used to extract transport delay information. Statistical information is derived from successive sequences of traceroute packets. Therefore transport delay and jitter profiles can be provided for each determined transport path between a pair of network nodes in a communications network. The extent to which these delay and jitter profiles can be used to derive per-hop statistics is left to higher level applications interpreting the statistical information, higher level applications which are beyond the scope of the present description.
Having provided an overview of ping and traceroute functionality, it is important to emphasize that, ping and traceroute packets are sent from a source network node and returned to the same source network node. The resulting statistics are also made available by, and at, the source network node.
Service providers include organizations and communications network infrastructure providing communications services to customers. Services include best-effort packet transport, MPLS packet transport, as well differentiated services such as Virtual Local Area Networking (VLAN) in support of Virtual Private Network (VPN) connectivity.
Currently service providers make extensive use of ping and traceroute functionality to verify connectivity on a very limited basis. Typically operations management personnel needs to physically and manually log-in on each remote source network node via a Command Line Interface (CLI), issue necessary ping and/or traceroute commands from a prompt specifying network node addressing manually, capture the output of the console, and retrieve the output from the remote source network node.
In service provider managed communications network it is more important to verify connectivity between individual routers. Routers include physical router communications network nodes as well virtual routers associated with switching communications network nodes. Referring to FIG. 4, five fully meshed routers R1, R2, R3, R4 and R5 are shown providing VPN services VPN1 and VPN2. Connectivity verification for VPN1 between Location 1 and Location 3 can be performed manually in two steps: ping/traceroute test T1 is run from R1 towards R3 and a second ping/traceroute test T2 is run from R3 towards R1. Each time a ping/traceroute test is run, the operator has to log-in on the source router, run the ping/traceroute test, and retrieve the results.
If connectivity verification is required between all peer routers in VPN1 more test steps would be required: ping/traceroute test T3 verifies connectivity from Location 2 to Location 3, another ping/traceroute test would be necessary to verify connectivity to Location 3 from Location 2 , another two ping/traceroute tests would have to be done between Location 1 and Location 2.
The operator has to perform more ping/traceroute tests for the other VPNs such as VPN2 between Location 2 and Location 4.
In performing connectivity verification in two separate steps between each pair of locations, it is not obvious to operations management personnel which router IP address and VLAN IDentifier (VPN1/VPN2) to use from which router. This level of operator involvement is inadequate as CLI command entry is a very time consuming, complex, and error prone procedure leading to large operational overheads incurred by service providers. In particular, manual command entry makes is impossible and untimely for connectivity verification to be performed in an environment in which a large number of customers subscribing to a corresponding large number of VPNs serviced by a service provider using an infrastructure of a large number of communications network nodes interconnected via a large number of links. Meaningful statistics need be derived from a large number of ping/traceroute tests performed in a relatively short period of time.
Packet traffic patterns vary over a period of time and are typically cyclical over the time of a day and cyclical over a week. Therefore it is important to both customers and service providers that connectivity verification be performed during peak hours (business hours and evenings) and peek weekdays (workdays and weekends). Therefore it is apparent that if manually directed connectivity verification is time consuming, then manual connectivity verification within test windows would be impossible due to overwhelming operational overheads involved. The number of connectivity verification tests grows with the number of location combinations for each VPNs making connectivity verification even more complex and time consuming.
The closest prior art relates to network topology discovery and includes:
A prior art U.S. Pat. No. 6,502,130 B1 entitled “System and Method for Collecting Connectivity Data of an Area Network” which issued on Dec. 31, 2002 to Keeler, Jr. et al. describes a system and method which collects dynamic connectivity data from an area network interconnecting multiple computing devices. The dynamic connectivity information is combined in a data warehouse with static network information, relating to the various users and their privileges. The combined data stored in a data warehouse permits the identification of each user and the various privileges of the user, correlated by connection port. The connectivity data is collected using commands in the simple network management protocol (SNMP). SNMP commands query all network devices such as hubs, routers, and gateways to other networks to obtain port connectivity information such as the identity of the ports being used by each network user. Although inventive, the solution proposed by Keeler Jr. et al. only achieves Open Systems Interconnect (OSI) Layer 2 and 1 connectivity discovery in support of billing applications for users subscribing to roaming network access services. Keeler Jr. et al. do not address issues related to ensuring adherence to service level agreements in real-time.
A prior art U.S. Pat. No. 6,205,122 B1 entitled “Automatic Network Topology Analysis” which issued on Mar. 20, 2001 to Sharon et al. describes a system and method for automatic detection of physical network topology, by correlating information from computers connected to a network. Although inventive, the solution presented by Sharon et al. does not address issues related to ensuring adherence to service level agreements in real-time.
A prior art U.S. Pat. No. 6,397,248 B1 entitled “System and Method to Discover End Node Physical Connectivity to Networking Devices” which issued on May 28, 2002 to Iyer describes an apparatus and method for determining physical connectivity between end nodes and networking devices within a network. Iyer addresses issues related to the SNMP protocol's inability to ascertain the physical connection between end nodes and networking devices. Although inventive, the solution presented by Iyer does not address issues related to ensuring adherence to service level agreements in real-time.
A prior art U.S. Pat. No. 6,405,248 B1 entitled “Method and Apparatus for Determining Accurate Topology Features of a Network” which issued on Jun. 11, 2002 to Wood describes a method for determining accurate topology features of a given network utilizing source address tables. The solution proposes acquiring source address table information from each port of each network switching node at regular intervals to determine when a particular source address was learned and when discarded. The source address information is used to issue Address Resolution Protocol (ARP) queries to ensure that the source address information is valid. While inventive, the solution presented by Wood does not address issues related to ensuring adherence to service level agreements in real-time.
A prior art U.S. Pat. No. 5,974,237 entitled “Communications Network Monitoring” which issued on Oct. 26, 1999 to Shurumer et al. describes a proprietary method for monitoring a communications network comprising a plurality of node equipment such as switches, and link equipment such as fiber optic links in which proprietary performance parameters of individual vendor specific components of the node equipment are used to determine an overall proprietary performance parameter for the node equipment. By comparing like proprietary performance parameters for individual network elements, the performance of different types of proprietary network elements can be compared with each other. Parameters which can be monitored include quality of service, cell discard, cell loss, and other measures of network performance. Connection tracing through the plurality of node equipment and link equipment is used employing proprietary means to provide topology discovery. While inventive, the solution presented by Shurumer et al. does not address issues related to ensuring adherence to service level agreements in real-time.
Other developments include, a prior art U.S. Pat. No. 6,222,827 B1 entitled “Telecommunications Network Management System” which issued on Apr. 24, 2001 to Grant et al. which describes a system for managing a Synchronous Digital Hierarchy (SDH) network and proposes the tracking and processing of network related data in support of specifying connectivity parameters for establishing data pipes. The solution relates to a network management system which forms an overall view of the network and its condition, from which the system gives configuration commands to each transmission equipment so that all configuration changes can be performed significantly more rapidly. While inventive, the solution presented by Grant et al. does not address issues related to ensuring adherence to service level agreements in real-time.
Reducing operating expenditures is important service providers. Addressing these concerns is especially important in large and complex service provider IP/MPLS communications networks. There therefore is a need to solve the above mentioned issues.