1. Field of the Invention
The present invention relates generally to network management, and more particularly, to monitoring health problems of network devices and services of a managed network environment.
2. Related Art
As computer networks have become more prevalent in corporate and other operating environments, network management software that is capable of solving network problems automatically and remotely has become more crucial. One of the major goals of any efficient network administration setup is the specification and measurement of acceptable performance thresholds for each machine in the network without creating additional network traffic. Network management software typically manages and automates administrative tasks across multiple machines in a network. Typical network management software allows administrators to measure log events and view status when performance criteria is not acceptable. Unfortunately, however, the administrator is often not informed of problems on the network by network management software until after one or more end users of the network has been affected
Accordingly, there exists a need in the art for a proactive diagnosis of network management problems in a timely manner. There is further a need for a complete, global view of the network environment, including a view of all critical components. There exists a need to quickly display to the administrator of a network health problems associated with devices and services on the network and provide the administrator with the capability to quickly respond to and correct pending network problems before end users of the network are impacted.
The Simple Network Management Protocol (SNMP) and Common Management Information Protocol (CMIP) are network management protocols that provide a generic mechanism by which different manufacturers' equipment and/or services can be monitored and controlled from a management system, such as a UNIX server. A network component or service provided on a managed network can be monitored and controlled using a management protocol to communicate management information between network components and services on the network. A network component can include networked personal computers, workstations, servers, routers, bridges, print servers, print queues, and printers. Network services, particularly in an Internet environment, can include electronic mail (e-mail), browsers, and service level agreements. There exist several key areas of network management including fault management, configuration management, security management, performance management, and accounting management. With the ability to instruct a network component to report events and the ability to start processes on a network component, the network can be manipulated to suit changing conditions within a network system.
A key mechanism by which various network devices communicate with a management system is via SNMP traps or CMIP events. Hereafter, “events” will be used to refer to either SNMP traps or CMIP events. Events allow for unsolicited notifications to be sent from one network device or service to another. This same mechanism can be used for communication between various cooperating software components within the management system.
There are several software products that receive events and allow a user to manage network devices. One of these products, Network Node Manager (NNM) from Hewlett-Packard Company of Palo Alto, Calif., enables a user to manage network devices using a graphical user interface (GUI) along with graphically representing relationships between network devices. Hereafter “NNM” will be used to generically refer to a product that receives events and allows a user to manage network devices, such as HP's Network Node Manager. From the NNM console, a user is able to discover and display all of the network devices on the network and to proactively monitor and manage all servers on the network. This makes it easy to determine the network status or to follow the path of a failed print job, for instance, and determine the point at which it failed. Because it is easy for a user to see how a network is configured, it is easy to manage network devices and optimize the configuration. For instance, a configuration may be optimized by balancing the number of print queues per print server or the number of print servers per file server. Any network device may be managed by an NNM such as NetWare® file servers, print servers, print queues, and printers. (NetWare is a trademark of Novell, Inc.) During initialization of the NNM, network devices are automatically discovered and added to a topology database. Each network device is graphically represented by an icon on the NNM console. Using NNM, a user can proactively monitor and manage all network devices on a managed network. A user can monitor the state on a network device over various periods of time by keeping trend data. A user can use trend thresholds to troubleshoot problems on network devices or to plan future expansion of network devices, such as increasing volume and disk sizes, or increasing the number of users allowed access to a server at one time.
All events are assigned a default severity which can be overridden by the user. The NNM utilizes registration files for user configurable information. The severity level of each event that is received by the NNM that corresponds to a particular network device is represented by a unique color. The severity level of a network device is indicated on the NNM console by the color of the network device's icon. A critical event, for instance, is depicted with a red icon. For instance, by default, a critical event is indicated to the user when a network device icon on the NNM map changes color to red indicating a critical status related to that network device. Thus, the current status of the entire network can be easily inspected by a user using the color status indications of the network device icons.
While the occurrence of a critical event for a network device is depicted by a red icon or other indication for that device, the simple color indication of a red icon, for instance, does not, in and of itself, communicate to the user exactly the nature of the critical event that caused the icon to change to a red color. There is an unmet need in the art for a user, such as an administrator of the network environment, to be able to not only know that the icon for a particular device indicates the occurrence of a critical event, but to also be able to quickly and readily ascertain the exact nature of that critical event.
Network printers are graphically represented with a printer icon representing each of the network printers on the network. A user can remotely determine the “health” status of any of the network printers visually. The LED status on the network printer can then be browsed to determine if the printer needs to be serviced or if human intervention is required. For instance, it can determined if a printer has any of the following problems: Out of paper; Out of ink; Paper jam; Door open; Toner low; Printer problem; and Bin full. A drawback of this approach, however, is that the exact nature of is the critical event, e.g. door open, has to be determined by looking at the problem network printer device itself and cannot be determined remotely by looking at the red color icon of the problem network printer on the NNM network console.
Servers are graphically represented with a server icon representing each of the servers on the network. A server running the appropriate agent software may be managed by a user from the NNM console. A server running the appropriate agent software responds to management data requests from the NNM console and transmits alarms from the server to the NNM console. This makes it possible for the NNM to display real-time server performance and configuration data on those servers and to monitor key performance statistics including: CPU utilization; number of users; number of connections; memory usage and configuration; installed software; and disk and volume usage. Thresholds can be set on these parameters to cause an SNMP trap, or they can be graphed by the NNM to evaluate history or trends. Parts of a server may also be viewed when troubleshooting a problem. Viewing components of a server's configuration (the network interfaces, for example) might help solve a critical problem with the server.
Server faults may be managed by monitoring key parameters of the servers, such as CPU load and available disk space, as well as noting significant events, such as NetWare Loadable Modules (NLMs) being unloaded or trustee rights changing. These conditions may be monitored directly at the servers and passed to the NNM via SNMP traps. For file servers, a user can obtain current and historical trend data and set alarm thresholds for trend parameters so that the user is notified when a threshold is passed.
Novell's NetWare Management Agent (NMA) Management Information Base (MIBs) and trap definitions are integrated into NNM. NNM may be configured to integrate the NMA traps with associated Novel “NetExpert” help text. When an SNMP alarm is sent to an INNM console, the alarm can be reviewed for more detailed help text describing the problem. The alarm, however, is not directly correlated to the red icon indicating that a particular network device is having a problem. This means that the process of reviewing the alarm sent to the NNM console is separate from the process of viewing a red icon on the NNM console and that these processes are not correlated. The user can also follow detailed instructions that guide the user through a series of steps to resolve the problem discovered by the NMA agent.
Referring to FIG. 3, IP-centric group views 60 for graphically displaying network devices, according to the prior art, is shown. User interface 62 contains a representation of the network indicated by IP Internet icon 64. Double-clicking the IP Internet icon 64 will result in the presentation of user interface 66 having containers 68, 70 for the group views of the network indicated by NW-Servers:GOTO icon 68 and NT-Servers:GOTO icon 70. Double-clicking on NW-Servers:GOTO icon 68 will result in the presentation of user interface 72 containing the NW-Server-related network devices discovered by NNM during initialization. Three NW-Server-related network device icons are shown each representing individual network devices: nwstrn0a icon 74, nwstrn0b icon 76, and nsmdem3 icon 78. This group view configuration is considered IP-centric (Internet Protocol Centric) because during network device discovery all network devices are initially contained in a single group view that is presented by double-clicking on the IP Internet icon 64. A user may manually construct basic group views such as NW-Servers and NT-Servers as shown as NW-Servers:GOTO icon 68 and NT-Servers:GOTO icon 70, respectively.
NodeView is a product that enhances products that receive events and allow a user to manage network devices such as NNM. Using NodeView, related network devices are automatically grouped into maps represented by group icons. Group views are hardwired into the NodeView code itself. Referring to FIG. 4, device-centric group views 80 for graphically displaying network devices, according to the prior art, is shown. User interface 82 contains a representation of the network on top of background 91, a map of the United States. The top-level network is indicated by Internet icon 84. The group views of the network are represented by NW-Servers icon 90, NT-Servers icon 92, Web-Servers icon 86, HP-Printers icon 88, and DMI-Clients icon 94. This group view configuration is considered device-centric because during network device discovery related network devices are automatically grouped into group views represented by group view icons. Double clicking on a group icon will explode a map, hereafter referred to as a “group view”, showing all the related devices that were previously discovered in the topology database. For instance, double-clicking on NW-Servers icon 90 will explode to a NetWare Servers group view showing all of the NetWare servers that were discovered in the topology database. A group view of related devices provides a user with a simple way to monitor and launch applications using the menubar and NetWare tool launcher from a single view of the managed environment. The menubars, popup menus, and toolbar remain consistent for each of the group views provided by NodeView.
In the prior art, the group views are hardwired into the NodeView code itself. This means that a NodeView user cannot select his/her own choices for group views nor dynamically update this selection. There is therefore an unmet need in the art to allow a user to be able to dynamically configure group view information. Additionally, the menubars, popup menus, and toolbar are not individually configured for a selected group view, but rather remain consistent regardless of whether an item is only applicable for certain group views and meaningless for others. There is therefore an unmet need in the art to allow the menubars, popup menus, and toolbar to be context sensitive to the group view.