The operators and users of enterprise networks prefer that their networks be predictable and provide consistent performance. Predictability and consistency are often more important than the raw capabilities of the network, i.e. a network that provides a consistent medium throughput is often considered more desirable than a network which provides very high throughput at some times, but performs poorly at other times. For many business applications, it is important that transactions be completed in a predictable manner while the time taken for the transactions to complete is relatively unimportant (provided it does not exceed a reasonable limit).
Prior art solutions provide network predictability by preconfiguring the network. This does not work in an IP network, because IP is dynamic and connectionless, and therefore relatively unpredictable. The typical enterprise network environment consists of several campus area networks interconnected by a wide area backbone network. The campus networks usually deploy high-speed links, and perform reasonably well. Congestion tends to occur in the backbone network, which consists of relatively slower speed point-to-point links, and in some of the campus networks which house the servers.
An approach is needed which will provide predictability on an IP backbone network, and do so for backbones with varying degrees of capability. If the network provider can predict the performance of the network, then he can implement service level agreements. A service level agreement is a formal contract entered into by a service provider and its customers. The service provider contracts to transport packets of electronic data between customer premise networks (branch offices, data centers, server farms, etc.) across the provider's backbone network with certain assurances on the quality of the service. This is known as the Service Level Agreement (SLA). The SLA specifies customer expectations of performance in terms of parameters such as availability (bound on downtime), delay, loss, priority and bandwidth for specific traffic characteristics. An SLA includes acceptable levels of performance, which may be expressed in terms of response time, throughput, availability (such as 95% or 99% or 99.9%), and expected time to repair.
SLAs vary greatly from one network to the next, and from one application to another running on the same network. They are normally based on some level of expected activity. For example, if a large airline wants to ensure that the lines at the ticket counter do not get overly long due to poor response time at the ticketing terminals, some estimate must be made of expected workload, so that the network administrator can be prepared with the necessary resources to meet that workload and still remain compliant with the performance terms of the SLA. Another example is audio/video conferences where a certain level of service needs to be guaranteed.
Managing an SLA is an important task because of the revenue implications of failure to support mission-critical business applications. The problem is exacerbated due to diversity of the traffic and due to poor and varying degree of service differentiation mechanisms within the backbone networks. Commercially significant traffic must be prioritised above workloads which do not have a critical time dependency for the success of the business. Many of these workloads in an IP environment are far more volatile than those which have traditionally been encountered in the prior art. In order to meet customer requirements in this environment, a service provider must provide a large excess capacity at correspondingly high charges.
This situation dramatizes the need for effective tools which can monitor the performance of the IP network or system delivering a service over the IP network. Also, there is a need for effective controls which allow the service provider of an IP network to manipulate the priority of the various workloads to be managed.
U.S. Pat. No. 6,459,682 shows a method of controlling packet traffic in an IP network of originating, receiving and intermediate nodes to meet performance objectives established by service level agreements. Traffic statistics and performance data such as delay and loss rates relating to traffic flows are collected at intermediate nodes. A control server processes the collected data to determines data flow rates for different priorities of traffic. A static directory node is used to look up inter-node connections and determine initial traffic classes corresponding to those connections. The rates are combined with the initial traffic classes to define codes for encoding the headers of packets to determine their network priority.
U.S. Pat. No. 6,519,264 shows a method for measuring a rate of message element traffic over a message path in a communications network. The path includes at least one connection and is associated with a maximum rate of transmission. The path is periodically polled for transmission of a message element, the polling being performed at a polling rate associated with polling intervals which are at least as frequent as the maximum rate of transmission. If transmission of a message element is detected during a polling interval, a running count of such detection is incremented, the running count of detection being associated with the connection over which the message element was detected. If transmission of a message element is not detected, a running count of such non-detection is incremented, the running count of non-detection being associated with inactivity of the message path. During each polling interval, an oldest stored value is retrieved from a memory which includes a preselected number of stored values that correspond to an equal number of most recent sequential events of detection and non-detection. Each stored value which represents an event of detection corresponds to an identifier denoting the connection over which the message element was detected. Each stored value which represents an event of non-detection corresponds to an identifier denoting inactivity of the message path. Following retrieval during each polling interval, the running count of detection associated with the connection corresponding to the identifier of the retrieved value is decremented if the retrieved value represents an event of detection. The running count of non-detection is decremented if the retrieved value represents inactivity. The retrieved value is thereafter replaced with a value corresponding to an identifier which denotes the connection over which the message element was detected if transmission was detected. Otherwise the retrieved value is replaced in the memory with a value corresponding to an identifier which denotes inactivity if transmission was not detected. The foregoing steps are repeated for so long as the measurement is undertaken. The rate of message element traffic over a connection of the message path is proportional to the running count of detection associated with the connection.
U.S. Pat. No. 6,363,056 shows a network monitoring method where incoming data packets are time stamped by an ingress node and time stamped again by an egress node. The difference between the two time stamps serves to calculate the delay.
Other network monitoring and managing tools are commercially available from Brix networks (http://www.brixnetworks.com/) and Ipanema Technologies (http://www.ipanematech.com/).
A common disadvantage of prior art network monitoring and control programs is the expense for performing the network measurements, especially in terms of additional network load.