1. Field of the Invention
The present invention relates to the field of network management. More specifically, the present invention relates to the self-diagnosis of faults within data networks.
2. Related Art
Today's high speed data networks are heterogeneous, more complex and increasingly data intensive. Thus, the networks are becoming more difficult to manage due to network complexity and size. Network engineers (NEs) manage the data networks and must be familiar with their specific data network's topology, the behavior of all the devices on their network, and be able to process huge amounts of seemingly unrelated data.
An important activity in network management is network diagnosis. A single fault in the data network can produce hundreds and even thousands of alarms that must be analyzed by the NE, which is a prohibitive task. Traditional network fault diagnosis required the direct involvement with a NE who analyzed large amounts of seemingly unrelated fault data to determine what is causing the data network to operate improperly.
The NE necessarily must have expertise in troubleshooting, an understanding of network device behavior, and specific knowledge regarding their network, such as, topology, typical traffic conditions, applications, and legacy systems. One problem with the management of data networks is that fewer NEs with the necessary specialized expertise are available. Thus, NEs are responsible for more area within the field of network management to overcome the lack of NEs in the field. However, allocation of the resources provided by the NE is inefficient. A NE spends an inordinate amount of time monitoring and troubleshooting a data network in order to resolve problems on the network. That time could be better spent accomplishing other network management tasks.
Prior Art FIG. 1 is an illustration of the traditional fault management as a process. A NE 150 has access to network data 130 and alarms 120 from a data network (e.g., local area network (LAN) 110). A network management tool can be used to monitor and collect performance data in the form of remote monitoring (RMON) data (e.g., RMON-1 and RMON-2), alarms, or events where present thresholds have been crossed. This data is aggregated and displayed to the NE in the form of display data 140, such as, graphs and tables.
Typically a troubleshooting episode is triggered by a user (not shown) of the data network 110. The user contacts the NE 150 with a problem regarding the network 110. For example, the user may be experiencing slowness in responses transmitted through the network 110, or the user may have lost connectivity.
At this point, it becomes the duty of the NE to isolate the fault and take corrective action. The NE can analyze the display data 140 to manually troubleshoot the problem. Many times, the display data 140 is insufficient, and so the NE must query the data network, as represented by the path 180 of queries, to further isolate and diagnose the fault. This usually takes the form of scripts, such as, ping, or traceroute, and through the use of sniffers. The network 110 then sends back to the NE query results in path 185.
Block 190 illustrates a flow chart of the process engaged by the NE 150 to perform troubleshooting. In step 160, the NE 150 analyses the fault data presented and diagnoses the fault. In step 165, the NE 150 develops and then implements a plan to correct the problem. Thereafter, the NE must verify the problem has been eliminated, in step 170. A NE may submit a report of the incident in step 175.
Thus, network diagnosis in the prior art was a manual process conducted by the NE. As FIG. 1 illustrates, the NE is responsible for isolating and identifying faults that are causing the problem. This can be time consuming and tedious, especially for large enterprise networks. As such, the analysis of fault data by the NE in today's larger heterogeneous networks is prone to error due to the large amounts of fault data to be manually processed.
Thus, a need exists for lessening the burden on a network engineer in the process of diagnosing faults within a data network. A further need exists for increasing efficiency in the process of diagnosing faults within a data network.