A Service Level Agreement (SLA) is a contract between two parties, the service provider and a subscriber, or between two service providers, that specifies, in terms of metrics, the type and level of service that can be obtained from the service provider. The metrics are measurable QoS parameters with agreed upon threshold values. Failure to meet these thresholds result in penalties, which are degrees of compensations mutually, agreed upon by the service provider and customer. Network and service performance need to be monitored for their degradation in order to avoid SLA violations, which might cause the provider to pay penalties. Therefore, real-time traffic flows across network segments need to be closely monitored for performance degradations.
A real-time monitoring system generally monitors network traffic for violation of threshold values agreed upon in SLAs and appropriately sends alarms to the operator in the event of any violations. Threshold violations of agreed upon terms is determined by monitoring all the SLAs and the monitor-data for traffic analysis is generally collected from the network elements by launching agents in all the network elements. Reporting tools have been developed for offline analysis of historic traffic data to generate trend reports for future traffic behavior. Real-time monitoring of SLAs have an impact on the network traffic since network agents that are installed in the various network elements generate continuous monitor-data related to the SLAs to help determine SLA violations. Real-time monitoring of networks are generally performed using two types of methods, namely, passive methods (non-intrusive monitoring) and active methods (intrusive monitoring). Passive methods generally retrieve the information from the packets that are received by a network element by reading the packet-header information and hence non-intrusive characteristics are used by passive methods for monitoring the network. Active methods retrieve information related to performance parameters by injecting additional probe packets across the network and hence intrusive characteristics are used by active methods for monitoring the network. The additional probe packets are inserted in between normal data packets. Real-time monitoring of SLAs is also performed by implementing a policy-based network architecture, which comprises of translating the SLAs into policies. The policies are then specified and entered into to a Policy Server. These policies can either be dynamic or static. Dynamic policies specify how to react when a particular criterion is met. Static policies basically check for the validity of defined criteria. Policy servers read the defined policies from a repository, determine the targets, and forward the policy information to appropriate policy agents/clients that reside in the network element. Then, policy clients translate the policy information received from the policy server and convert it to element specific configurations. The policy clients continuously monitor for these dynamic and static policies and respond to the policy server or take some action based on defined criteria.
Regardless of the usefulness of network monitor-data to operators for the purposes of network management, monitoring will quickly be discontinued if the network overhead due to monitor-data increases to such a point that the flow of data through the network is adversely affected. The monitoring of data should be in such a way that the measurement tools can remain in operation at all times, especially during high workload and system stress situations.
The present invention reduces the monitor-data flow across the network by monitoring only a subset of the SLAs and by optimally distributing the “intelligent” network probes across the network. The subset of SLAs to be monitored is identified based on an optimal analysis of the network load and past violation patterns. The optimal distribution of network probes is based on the “load similarity” of the nodes of the network and due to this aspect of the network nodes, the probes in the select network nodes are capable of computing the network load due to the subset of SLAs by predicting the load at rest of the nodes based on the notion of similarity. This approach leads to the reduction in monitor-data flow across the network.
the nodes based on the notion of similarity. This approach leads to the reduction in monitor-data flow across the network.
Revenue enhancement is one of the key requirements of any network provider. The key factors for sustained revenue generation are management of network capacity, penalty reduction by providing service assurance as contractually agreed in an SLA, and retaining the subscribers by reducing churn rate.
The objective of the network capacity management is to be able to successfully load the network to its capacity. By reducing the monitor-data flow within a network, more network bandwidth is available for carrying payload and hence contributing to increased revenue to network provider. An additional way for the provider to enhance revenue is to allow for over-subscription. Under such conditions, there is a possibility of network overload leading to operator SLA violations. Hence, there is a need to establish a balance between penalty reduction and over-subscription. The present invention aims at generating operator violation alarm at a future time point based on cross-correlation of past and present usage pattern data. More accurate the operator violation predictions, higher are the chances for the operator to increase the revenue based on over-subscription.
U.S. Pat. No. 6,147,975 to Bowman-Amuah for “System, method and article of manufacture of a proactive threshold manager in a hybrid communication system architecture” (issued Nov. 14, 2000 and assigned to AC Properties B.V. (NL)) describes a proactive threshold manager that forewarns service providers of an impending breach of contract. The threshold manager sends an alarm to the service provider when the current level of service misses a service level agreement to maintain a certain level of service.
The above said invention makes use of real-time monitor-data related to all service level agreements to check the current level of service. This adds to the network congestion during the times of network overload. As compared to the above-mentioned patent, the present invention aims at reducing the monitor-data flow by monitoring a select set of SLAs and by a regionalization procedure to cluster the network nodes based on similar load behavior. In addition, the present invention generates alarms by combining the prediction of the current usage pattern based on a best possible, from a suite of forecasting models, model with the prediction of the usage pattern in the past based on a best possible, from a suite of forecasting models, model. Furthermore, the present invention makes the operator SLA violation prediction more realistic, and hence reduces the generation of false alarms, by analyzing the forecasted usage pattern trend and focusing on only those CSLAs where the load due to each such CSLA being below the contractually agreed upon bandwidth for that CSLA.
U.S. Pat. No. 6,272,110 to Tunnicliffe , et al. for “Method and apparatus for managing at least part of a communications network” (issued Aug. 7, 2001 and assigned to Nortel Networks Limited (Montreal, Calif.)) comprises of managing a communications (customer's) network by predicting sequential future values of a time-series of data (representative of traffic levels in the network) to determine whether they exceed bandwidth levels as defined in the SLA. In contrast to the above-mentioned patent, the present invention analyzes the traffic data at network-level as opposed to at customer-specific sub-network-level. Further, the present invention analyzes the load on the network from the network provider's perspective to reduce penalty situations.
Forecasting Techniques based on Time Series models have been applied to network resources in order to predict network traffic for the purposes of network management. “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” by Rich Wolski, Neil Spring, and Jim Hayes, describes the Network Weather Service, which is a distributed, generalized system for producing short-term performance forecasts based on historical performance measurement. A suite of forecasting models is applied over the entire time-series and the forecasting technique that has been most accurate over the recent set of measurements is dynamically chosen as the representative model.
“Design-Assisted, Real-time, Measurement-Based Network Controls for Management of Service Level Agreements,” by E. Bouillet, D. Mitra and K. G. Ramakrishnan, describes an SLA monitoring scheme that optimizes net revenue based on the flows related to a set of SLAs. This approach proposes two distinct phases: Offline phase that includes SLA crafting and online phase that includes real-time SLA management. Revenue gets generated by the admission of flows into the network and penalty is incurred when the service provider is not SLA compliant.
To summarize, the objectives of the present invention are the following:
An integrated Overload Monitoring System (OMS), for offline Critical Service Level Agreement (CSLA) Identification based on bandwidth (BW) and Near-Optimal Traffic Analysis for Forecasting BW-related CSLA Operator Violations at a future time point, comprising of:                a) offline procedures for critical SLA identification, regionalization of network nodes, historical data based forecast model selection, overall load due to critical SLAs and offline operator SLA violation prediction at a future time point;        b) universal network probes that perform a near-optimal analysis of the network traffic for critical SLAs; and        c) online procedures for network traffic analysis by the network probes based on two distinct configurable clocks, online operator SLA violation prediction at a future time point and consistency based alarm generation.        