This invention relates to a method and apparatus for anomaly detection in a network management system.
Known network management tool monitor computer systems are manually configured for usage patterns, thresholds, and other characteristics. Configurations are manually customized by administrators who observe data for every computer system in the network and determine key performance indicators (KPI). Typically such configurations are bundled with a computer network management tool as part of a product. The problem with bundling configuration data as a product is that it is often not what the customer actually needs because needs and networked computer systems tend to be widely unique. Off the shelf configuration data assumes specific KPIs and requires administrator skill and time to tune and establish actual baselines for every KPI on every computer system. Such tuning is prone to human error. In addition, when a new KPI is added, or an old KPI is changed, a vendor is often required to update a package before it can be used by the network management tool.
Network management tools include performance management tools such as IBM Tivoli Monitoring (ITM), IBM Tivoli Composite Application Manager and IBM Tivoli Netcool Performance Management (TNPM), fault management tools such as IBM Netcool OMNIbus, and service monitoring tools such as IBM Tivoli Business Service Manager (TBSM). These tools are configured on installation to look at certain KPIs and notify operators when their values cross a predefined threshold. The result is that a threshold is frequently tuned when the usage pattern of the resource being monitored changes. IBM, Netcool, OMNIbus and Tivoli are registered or unregistered trademarks of International Business Machines Corporation in the US and/or other countries.
Typically a company defines performance thresholds and raises an alarm when a defined threshold is breached (for example for central processor unit usage and response times). One problem with this approach is that threshold definitions take a long time to establish in order to reduce the number of false alerts and missed alerts. Configuring threshold definitions is a time consuming and expensive process because it requires a deep understanding of an underlying platform.
A solution to reduce configuration requirements is to provide a set of configuration settings for each metric. These configuration settings are usually grouped in metadata “packs” and a different pack is needed for each operating system. Each pack can take weeks to build because of the number of data sources to connect to.