Computer networks are becoming larger and more complex. Network management of computer networks often involves monitoring deployed nodes on the network (e.g., computers, servers, routers, sub-networks, network enabled devices, and the like). This monitoring process may involve a variety of parameters that are important to the system manager and the health of the network.
Monitoring performed by a client network management system can include measuring and collecting performance data of servers and other computer systems in the network. Performance measurement and system health monitors can measure and collect data needed to diagnose a problem with a system on the network. Performance measurement and system health monitors can use a measurement engine that acquires desired system metrics (e.g., CPU utilization, percentage of memory used, and the like). This data can then be used for generating performance reports and for aiding system operators in diagnosing system problems such as a memory bottleneck. Those skilled in the art will appreciate that a significant amount of data may be necessary to diagnose potential system problems.
Examples of known performance measurement and system health monitors can include commercially available software systems, such as MeasureWare available from Hewlett-Packard Company and Patrol available from BMC Software, Inc. Known performance measurement and system health monitors typically require the customer to define performance thresholds. When performance crosses the defined thresholds, an alert is generated to notify system administrators or support personnel, perhaps accompanied by a static set of recommendations or corrective actions.
Threshold-based performance monitoring is reactive in the sense that customers are not made aware of an emerging problem until a threshold is reached. Experts can be assigned to customers with performance problems, however they are usually limited in number and limited in how many customers can be helped. Known systems and methods do not causally link performance improvement or degradation with configuration changes that may be a factor. Such systems and methods require expertise in the customer's information technology (IT) staff to evaluate relative merits of static advice sets to determine what course of action should be attempted first. One customer does not automatically benefit from learning at other customer sites because prior threshold-based performance monitoring is localized to a customer site.