Currently, dependability Service Level Agreements (SLAs) existing between service providers and service customers and between service providers and solution vendors are ambiguous, vague and difficult to measure. As enterprises become increasingly more dependent on networks for their critical business services, service dependability is becoming an important attribute for service providers to guarantee to service customers. Service dependability refers to aspects of reliability, availability, maintainability, and survivability of services, networks, and network elements. The ability of a service provider to set clear and meaningful dependability SLAs can be a significant market differentiator. However, meaningful SLAs must be able to be measured in a way that reflects the experience of the service end-user or end-devices.
It is important to be able to properly reflect what the end-user experiences. However, whether an end-user experiences a network event is not always important in determining how the network event is classified. For example, mission-critical service users do not have to experience an outage for it to be identified as a service outage. This is especially true for safety-based services as the service customer is paying for around-the-clock service for the comfort of knowing that the service is ready for use when it is needed. In contrast, for a non-critical service such as Web-browsing, an outage may be considered a service outage only when directly experienced by the customer.
Performance measurement features, which measure packet delay, jitter and integrity, are common in conventional networks today. It is critical that any performance measurement monitoring of a service experience of the customer does not adversely impact network performance. The measurement of packet parameters must be done at a sufficient granularity to detect network failure events that matter to the service type of the customer and provide a proper determination of an event from an end-user or end-device perspective.
From a perspective of the end-user or end-device, performance thresholds are different depending on the usage profile stage (e.g. service access vs. service use) and timeliness requirements for the service application type. For example, an event that causes a 30 second packet delay would be considered a service outage for telephony but not for email. A service failure for PSTN telephony is when a subscriber experiences dial tone delay greater than 5 seconds or ring back delay greater than 9 seconds when trying to access the service and greater than 5 second delay (or the subscriber being disconnected and having to re-dial) when in a talk state. At these thresholds the subscriber would conclude that a service denial or premature disconnect had occurred. However, if the impairment lasted long enough such that after three attempts (when the typical subscriber abandons the call attempt), the subscriber would consider that the service is unavailable. This threshold is considered to be 30 seconds.
Today's communications network measurement of failure event data has focused on improving the effectiveness and efficiency of operations personnel. Network elements generate warning alarms and data regarding failure events that enable operational personnel to diagnose and field-repair the equipment. Network products exist today that are capable of assigning timing information to these alarms and data to facilitate the automated reporting of reliability-related metrics. However, this is not sufficient to provide meaningful SLAs since the information does not reflect the impact from the service customers' perspective.
The next generation network is a multi-service network that provides a wide range of service applications each with their own threshold criteria for service failure and service outage. The application types are real-time interactive such as voice, video conferencing, e-gaming, and financial transactions, real-time non-interactive such as video and television and non-real-time such as email and downloading files.
Currently operational personnel need to capture, analyze and compute data in order to track service dependability performance so that it can be compared to objectives. This requires a significant amount of manual effort and usually only considers network element failure modes, though a considerable contributor to end-to-end service dependability issue may be network failures, such as cable cuts. Therefore it would be advantageous to have an autonomous system that can measure, analyze and report network failure events in terms of dependability parameters and statistical dependability metrics.