High availability is a critical system requirement in Internet Protocol (IP) networks and other telecommunication networks for supporting applications such as telephony, video conferencing, and on-line transaction processing. Outage measurement is critical for assessing and improving network availability. Most Internet Service Providers (ISPs) conduct outage measurements using automated tools such as Network Management System (NMS)-based polling or manually using a trouble ticket database.
Two outage measurement metrics have been used for measuring network outages: network device outage and customer connectivity downtime. Due to scalability limitations, most systems only provide outage measurements up to the ISP's access routers. Any outage measurements and calculations between the access routers and customer equipment have to be performed manually. As networks get larger, this process becomes more tedious, time-consuming, error-prone, and costly.
Present outage measurement schemes also do not adequately address the need for accuracy, scalability, performance, cost efficiency, and manageability. One reason is that end-to-end network monitoring from an outage management server to customer equipment introduces overhead on the network path and thus has limited scalability. The multiple hops from an outage management server to customer equipment also decreases measurement accuracy. For example, some failures between the management server and customer equipment may not be caused by customer connectivity outages but alternatively caused by outages elsewhere in the IP network. Outage management server-based monitoring tools also require a server to perform network availability measurements and also require ISPs to update or replace existing outage management software.
Several existing Management Information Bases (MIBs), including Internet Engineering Task Force (IETF) Interface MIB, IETF Entity MIB, and other Entity Alarm MIBs, are used for object up/down state monitoring. However, these MIBs do not keep track of outage data in terms of accumulated outage time and failure count per object and lack a data storage capability that may be required for certain outage measurements.
The present invention addresses this and other problems associated with the prior art.