1. Field of the Invention
The invention disclosed herein relates, in general, to switching system equipment. More specifically, the present invention relates to a method for intelligent distributed health monitoring and fault repairing in switching system equipment.
2. Description of the Related Art
Emergence and penetration of communications technology has definitely changed the way we work, communicate, and socialize. Businesses these days rely a lot on communications technology and, hence, connectivity has become a prerequisite for every business. Further, switching technologies such as Multiprotocol Label Switching (MPLS) have helped businesses to harness the potential of communications technology to the fullest because of the connectivity provisions they offer with remote locations. For reliable connectivity services, specifically in large switching systems, overall availability and reliability of connectivity services is dependent on proper functioning of hardware and software components of the switching systems as well as the peripheral components. Accordingly, various health monitoring mechanisms are used to track a status of the hardware and the software components in large switching systems.
The health monitoring aspects of a distributed packet switching platform plays a vital role in ensuring the reliability and availability of the services it provides in various deployment networks, ranging from service provider, data centers, cloud computing, high performance scientific computing clusters, storage area networks, time sensitive financial and health services networks. Since health monitoring determines the robustness and availability of the individual switching nodes, it ultimately determines the overall health and safety of the network interconnecting routers and switches. The quality and efficiency of the health monitoring system in switching platforms directly determines the cost of maintenance and servicing related downtime, cost of timeliness for repairs, the forecasting of lifetime. Thus the health monitoring is of critical importance for reducing overall costs of network deployment, maintenance leading to market differentiation and growth of opportunities for revenue generation.
Currently, the heath monitoring mechanisms deployed for switching systems are only able to verify whether the hardware and software components of the switching systems are able to exchange periodic heath check request messages and response poll messages. These health monitoring mechanisms conclude the availability of the switching systems only on the basis of successful exchange of these heath check request messages and response poll messages. Usually, these health monitoring mechanisms work well in smaller setups, however, in a large distributed switching system, there are cases when the components may experience operational faults, but still are able to exchange poll messages. Certainly, such cases provide false information about the availability and health of the switching systems and overlook the actual operational failures.
Further, conventional health monitoring mechanisms don't offer self-corrective provisions and, therefore, are incompetent in triggering any corrective action when operational failures in the switching systems are identified. Furthermore, current health monitoring mechanisms rely solely on poll messages and do not use any statistical performance measurement or threshold violation monitoring as a tool to identify potential problems in the switching systems.
Lack of good health monitoring mechanisms and inability to have self-corrective provisions impacts reliability and performance of transport tunnel services in MPLS and Generalized Multi-Protocol Label Switching (GMPLS) traffic engineered networks and Layer Two Virtual Private Networks.
According to the foregoing discussion, it can be observed that the existing methods and mechanisms used for monitoring health of switching systems are inadequate to ensure high performance, better reliability and availability of the switching systems. Firstly, these mechanisms only rely on poll messages to various components of the switching systems. Further, these mechanisms do not offer any self-corrective provisions. In light of this, therefore, there is a need for a method and system for intelligent distributed health monitoring in switching system equipment, which overcomes some or all of the limitations identified above.