In a communications network, there is a need for providing a high level of service availability for data traffic travelling in the network. Accordingly, for network elements in the communications network, redundant datapaths are provided. If there is a problem with a particular network element, such as a node or a link, the data traffic is re-routed onto an alternate datapath. At the network element level, as the service availability of each node and link may affect the overall service availability of the network, it is necessary to monitor each node and link for faults in order to maintain a high level of service availability for those nodes and links.
For example, a node comprising a routing switch may be monitored for faults so that its service availability can be maintained at a high level. While providing redundant datapaths within the routing switch partially addresses the issue of maintaining high service availability, it is also desirable to be able to isolate a fault, and to repair or replace any faulty components within the routing switch, so that the redundancy built into the routing switch continues to be fully functional.
In the prior art, various solutions have been proposed for isolating errors in a node, such as a routing switch, so that a faulty component or field replaceable unit (FRU) can be identified and replaced. However, in more complex configurations providing multiple fault indications, the source of a fault may be uncertain. This is particularly problematic where the fault indications are at an interface linking a component to one or more other components. While a step-by-step manual test of each component may eventually identify the faulty component through trial and error, the process can be unreliable and time consuming.
Thus, there is a need for a more comprehensive method and system for analysing and correlating errors occurring within a group of components or FRUs, such that identification of the faulty FRU is improved.