In a communication network operated by a service provider, the service provider offers bandwidth in the network to customers. The service provider typically has a Service Level Agreement (SLA) with its customer, whereby the service provider commits to provide communication services with service level guarantees to the customer and receives compensation according to the payment schedule in the SLA as long as the provider achieves its service commitments. SLAs commonly include penalties when service commitments are not met, for example, as a result of a link failure in the network. During a subsequent network recovery period, service to a customer is disrupted. Accordingly, there is a need for accurate tabulation and measurement of service outage times for the customer.
The communication network, or more particularly a portion thereof, may fail for various reasons, including a software defect or equipment failure. When a failure is sensed by other network elements adjacent to the failed portion of the network, signalling standards may require that all calls affected by the failure should be released, thus causing all of the bearer channel cross-connects relating to those calls to be released. If a call control entity (for example, a call processor supporting switched virtual circuits or SPVC services) on a first network element fails, all of the signalling interfaces with other network elements managed by the call processor will be lost Adjacent network elements or nodes will thus presume that the bearer channels associated with the failed signalling interfaces are no longer operable. This causes the adjacent network elements or nodes to signal this status across the network and release all cross-connects to the bearer channels composing the call. Ultimately, the failure in the signalling network will be signalled back to the calling and called services, which terminate their sessions.
A similar situation occurs upon the failure of a network link or line card module carrying user traffic. The failure of this link or card is detected by the network elements which then release all cross-connects for the bearer channels composing the calls.
As the number of connections across a physical link increases in a communication network, so does the time required to release, reroute and restore these connections in the event of a failure of a network element. In a signalled network, for example, the rate of restoration varies by network but may be in the order of, say, 100-1000 connections per second. Therefore, rerouting a large number of connections of 10,000, for example, may require (in an ideal, uncongested network) 10-100 seconds to complete. Also, as the number of connections traversing a single physical entity (link or node) increases, the restoration time increases. Furthermore, the number of physical entities through which release messages must traverse toward the originating or source nodes for each connections being rerouted impacts the delay in restoring the connections. From an SLA perspective, the outage time recorded should accurately represent the duration for which each traffic-carrying connection is unavailable.
In typical prior art systems and methods, service downtime is measured from the viewpoint of a source node, using only that source node's clock, as that source node receives a release message and a subsequent connect message. Therefore, propagation delays for release messages arriving at the source nodes, and queuing of release messages at each intermediate node before processing, are not measured as part of the downtime. This untracked propagation delay and queuing time can represent a significant portion of the total time that service to a customer is disrupted. As a result, typical prior art systems and methods for measuring service outage times do not scale well in larger networks due to the increasing network database size and message traffic.
Thus, there is a need for a system and method for providing service availability data that improves upon the prior art systems.