In a large network-based service environment, such as a Voice-over Internet Protocol (VoIP) network, an end-to-end service establishment may consist of execution of several applications. One such application may run on a subset of network elements and may have dependency on a subset of other applications that run on a subset of other network elements. Failure of one such application or a network element can result in delays or failure of service processing, which may not be tolerable due to the real-time nature of the communication service.
Mechanisms can be deployed by applications to detect failures of dependent applications or of hosting network elements. However, as the size of a network grows and as more vendors contribute their products to the network, operational status communications grow in proportion to the square of the number of network elements. This can contribute to significant overhead. In addition, application dependency is typically not symmetric and fully meshed, leading to manual configuration of each individual application or network element to monitor remote peers. This can place significant operational burdens on network administrators. Moreover, incompatibility and interoperability problems in a multi-vendor and multi-technology environment can prevent network service providers from implementing such monitoring consistently throughout a network. Hence, there is a need for an improved system and method of health monitoring and fault mitigation in a network system.