Computer systems linked to each other in a communication network are commonly used in businesses and like organizations. Computer system communication networks ("networks") are growing in size--as measured by the number of applications and the number of users they support--due to improvements in network reliability and the recognition of associated benefits such as increased productivity.
As the size of networks increases and as organizations become more reliant on such networks, the importance of effective network management tools also grows. In response to the need for standardization of such tools, primarily to control costs but also because components in a network are likely to originate from many different vendors, the Simple Network Management Protocol (SNMP) was developed and widely adopted. There have been a number of management information bases (MIBs) defined since adoption of SNMP, such as MIB-II, Remote Network Monitoring (RMON), and RMON2.
SNMP, RMON and RMON2 thus are network management software tools that provide a set of standards for network management and control, including a standard protocol, a specification for database structure, and a set of data objects. RMON and RMON2 are implemented in a network through MIBs which contain instructions specifying the data that are to be collected, how the data are to be identified, and other information pertinent to the purpose of network monitoring. In the prior art, the MIBs are implemented through RMON probes to monitor the local areas of the network. (An RMON probe typically is a computer system strategically located within the network so as to monitor a local area of the network.) The network monitoring information obtained by the RMON probes is communicated-to a central computer system that is accessible by the network manager.
Prior art network monitoring and management tools have trouble aiding the network manager in determining whether a problem within the network is associated with the network equipment itself or with the computer systems coupled to the network. If this information were known, it would allow the network manager to identify and implement the appropriate corrective action. For example, if a user places a request for a particular application from a client computer system to a server computer system and a response is delayed or is not received, the prior art network management tools do not give the network manager enough information to identify whether the problem is occurring because of a bottleneck in the network equipment or because the client or server computer system is not functioning properly.
Effective network monitoring and management tools are also needed in order for vendors of network management services to demonstrate compliance with the governing service level agreement (SLA). Many businesses contract with vendors for network management services. Such contracts are typically implemented with SLAs which specify metrics against which the provider of the network management services is measured. These metrics are used to quantify standards of performance that allow businesses to assess not only the performance of the network but also the performance of the network management services provider. Prior art network management tools generally do not provide effective means for monitoring the network and facilitating compliance with the requirements contained in the SLAs.
The prior art network monitoring and management tools are problematic because they do not provide the network manager with sufficient and readily accessible information enabling him/her to quickly pinpoint the source of a problem and solve it. In the prior art, the network manager must look at various sources of information, typically beginning with available network information from the RMON probes, to try to identify the cause of a problem. Once the network manager reviews the network information available, only then may he or she conclude that the problem is not with the network equipment but with a server or client computer system on the network. At this point, the network manager (or an equivalent system manager) begins a lengthy process of researching potential causes from the system perspective. While there may be some degree of manual coordination of the network and system efforts to identify the cause of a problem, the network tools in the prior art are not capable of automatically facilitating a coordinated effort to an extent that is optimum. Thus, in the prior art it is not possible to quickly and automatically pinpoint a problem as either a network problem or a system problem.
Timely correction of problems on a network is essential, because of the effect on user productivity and the desire for fast service that is prevalent among users. Service level agreements also place a premium on timely resolution of network problems. Thus, the prior art techniques for monitoring networks and identifying problems and their causes are not responsive to the requirements of the users and the business served by the network. The prior art techniques are also not responsive to the needs of the network and/or system managers who are charged with accomplishing timely identification and resolution of problems.
Another disadvantage to the prior art is that the RMON probes are capable only of monitoring network performance. The RMON probes cannot monitor the performance of client and server computer systems and communicate information about system performance to the central computer system used by the network manager. Therefore, in the prior art, the monitoring tools do not provide information about system performance.
In one prior art system, a server computer system and a client computer system send messages, commonly referred to as "heartbeats," to each other to affirm that a connection exists and that both computer systems are functioning. However, the heartbeats only communicate between the lower levels of software in the computer systems (e.g., between the protocol stacks), and so do not provide an indication of a possible problem at the higher levels of software, such as "memory thrashing" in the central processor unit of a computer system. Hence, a computer system may in fact be experiencing a problem that would not be detectable in the prior art, and the network/system manager may conclude based on the information available that the computer system is functioning satisfactorily.
Another drawback to the prior art is that the limited network and system information that is available to the network manager is not historical; that is, information regarding the recent performance of the network and system preceding the occurrence of a problem is not retrievable by the network manager. As such, the network manager can only view network or system performance after a user has identified a problem. Thus, in the prior art, valuable historical information that may aid the network manager in understanding the source of a problem is not available.
Thus, a need exists for a method to monitor a computer system communication network that readily detects a problem and permits the network manager to quickly identify the cause of the problem. A need further exists for a method that accomplishes the above and enables the network manager to demonstrate compliance with the provisions of the governing SLA. A need yet exists for a method that accomplishes the above and is compatible with the SNMP protocol that is currently employed. The present invention solves these needs. These and other objects and advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.