There is a distinct challenge in monitoring large scale networks and two fundamental approaches. One approach is to use a central system (e.g. a network management system NMS) that polls all the devices in the network and tries to understand what is happening. Another approach is to distribute the intelligence between the network devices so that they can self diagnose problems and only report exceptions, rather than the volumes of information that say everything is still fine.
There is significant industry research and development in area of network device management, but little in distributed router management with intelligent agents. Most networks currently use centralized network management systems (NMS), and local agents on most network devices, which operate as clients of the NMS. In this case, the local monitoring is enabled by software running on the respective node, for sending some data to the central NMS that ultimately provides the intelligence and management capabilities.
As network complexity grows, it is unlikely that the central systems can remain competitive and scale up according to the network growth. The disadvantages of the current centralized NMS solution are, to list a few:                Centralized monitoring, diagnostics and maintenance tools require significant network resources in order to effectively manage the network. Generally, only a limited amount of information is gathered from the devices in the network, in order to constrain the resource usage in a large network. As a result, many problems are not detected until after they occur.        Preventative maintenance requires volumes of detailed information to be processed, the vast majority of which is a result of normal operation. Centralized systems cannot manage the amount of information in a large network to offer significant preventative maintenance capabilities.        Central systems require resources to function, such as the ability to retrieve data from the nodes. When the network is misbehaving, and when the maintenance systems are required to be operational by definition, the resources may not be available, i.e. the central system may not be able to communicate with the devices in its network.        The topology of the network must be known when using centralized management system solutions.        
It is also known to use systems external to a network device that analyze more deeply a specific node behavior, such as routing. While the external mechanisms may provide a more in-depth analysis of the device operation, these systems use extensive computing resources. As a result, equipping a network device with such a complex mechanism is not economically feasible due to the limited amount of physical resources available at the node. In addition, the external systems do not report the problem immediately due to the need of off-line computations; there is no way to retrieve synthetic results at a speed compatible with quick reaction to a detected anomaly.
Some network devices are enabled with embedded monitoring engines. However, the existing embedded mechanisms are very rough and too weak to provide the network operator with meaningful information on the behavior of the host network device. Such embedded mechanisms generally compare counter values with fixed values and trigger alerts when the threshold is exceeded.
There is a need to provide a network device (such as a router or a switch) with an agent for distributed monitoring and diagnosis of network operation. Such an agent will operate as an intelligent distributed agent to filter the data provided to the NMS, thus reducing the network traffic overhead transmitted to the central management system.
There is also a need to provide a network device with a multistage intelligent monitoring and diagnostic agent which incrementally triggers the resource-consuming monitoring and analysis modules at the router as need be.