Computer systems linked to each other in a communication network are commonly used in businesses and like organizations. Computer system communication networks ("networks") are growing in size--as measured by the number of applications and the number of users they support--due to improvements in network reliability and the recognition of associated benefits such as increased productivity.
As the size of networks increases and as organizations become more reliant on such networks, the importance of effective network management tools also grows. In response to the need for standardization of such tools, primarily to control costs but also because components in a network are likely to originate from many different vendors, the Simple Network Management Protocol (SNMP) was developed and widely adopted. A number of management information bases (MIBs) have been defined since adoption of SNMP, such as MIB-II, Remote Network Monitoring (RMON) and later RMON2. RMON and RMON2 provide the capability for remote network monitoring; that is, a network manager is able to monitor network performance from a central computer system that has access to other components on the network, referred to as RMON probes, that monitor local areas of the network.
SNMP, RMON and RMON2 thus are network management software tools that provide a set of standards for network management and control, including a standard protocol, a specification for database structure, and a set of data objects. RMON and RMON2 are implemented in a network through management information bases (MIBs) which contain instructions specifying the data that are to be collected, how the data are to be identified, and other information pertinent to the purpose of network monitoring. The MIBs are implemented through the RMON probes to monitor the local areas of the network.
Network managers use the RMON and RMON2 MIBs using SNMP to collect information regarding the performance of the network. By collecting information about network performance and analyzing it, the network manager is able to recognize situations indicating that either a problem is present or impending.
For example, the network manager (or any of the network users, for that matter) may be interested in obtaining performance statistics such as the average and worst-case performance times and the reliability of the network for a particular application. Such applications generally describe a transaction between a user that is accessing the network through a client computer system and a server computer system that responds to the client computer system with the requested information. Network. managers need performance statistics to help them manage and maintain the network and to plan for network improvements. For example, performance statistics can be used to recognize bottlenecks in the network before they cause problems so that corrective action can be taken. If the performance statistics indicate a growing load in one area of the network, network traffic (in the form of data packets that travel through the network's communication equipment) can be routed along a different path. Statistics accumulated over a longer period of time can be used to help decide inwhether it is necessary to expand particular areas of the network.
Performance statistics are also necessary for businesses and the like to determine whether the network support provided by a vendor of network management services is satisfactory or not. Many businesses contract with vendors for network management services. Such contracts are typically implemented with service level agreements (SLAs) which specify metrics against which the provider of the network management services is measured. These metrics are used to quantify standards of performance that allow businesses to assess not only the performance of the network but also the performance of the network management services provider. SLAs generally include a provision specifying metrics for performance time for critical applications, where performance time, for example, is considered to be the amount of time between the time a user submits a request via the network and the time until the response to that request is received by the user. An effective network management tool should therefore provide a means for monitoring the network and gathering performance statistics for comparison against the requirements contained in the SLAs. However, as will be seen in the discussion below, the network management tools in the prior art do not provide a ready means of demonstrating compliance with SLAs.
Prior art network management tools have trouble aiding the network manager in determining whether a problem within the network is associated with the network or with the system hardware supporting the network, so that the network manager can identify and implement the appropriate corrective action. For example, if a user places a request for a particular application to a server computer and a response is not received, the prior art network management tools do not. generally identify whether the problem is occurring because of a bottleneck in the network or because the server is not functioning. Therefore, as will be seen in the discussion to follow, the network management tools in the prior art do not provide a ready means of monitoring performance of the entire network so that problems can be quickly detected.
With reference to FIG. 1, a prior art method used for network monitoring is illustrated for a simplified network 100. Network 100 is typically comprised of a plurality of client computer systems 110a, 110b and 110c networked with a number of different servers 130a, 130b and 130c. For this discussion, the focus is on client computer system 110c connected via communication lines 120 and 122 to server computer system 130c. Data packets (not shown) from client computer system 110c travel to server computer system 130c and back on either of communication lines 120 and 122, depending on the amount of traffic present on those lines due to simultaneous communications between client computer systems 110a and 110b and server computer systems 130a, 130b and 130c. The request data packets issued from client computer system 110c contain data that specify the address of client computer system 110c and the address of destination server computer system 130c, as well as other data pertinent to the application being used, such as data defining the request being made. The response data packets issued from server computer system 130c also contain the sender and destination address as well as other data needed to respond to the request.
With reference still to FIG. 1, coupled into communication lines 120 and 122 are other communications equipment such as switches 124 and 125 and routers 126 and 127. Also on communication lines 120 and 122 are RMON probes 140 and 142 (the term "RMON" refers to both RMON and RMON2). An RMON probe typically operates in a promiscuous mode, observing every data packet that passes only through the communication line to which it is coupled.
RMON MIBs provide the capability to define filters that can be used to limit the number of data packets observed by an RMON probe that are to be captured or counted. Filters are specified based on the type of data packet or other packet characteristics associated with the data contained within the data packet. Filters permit the RMON probe to screen observed data packets on the basis of recognition characteristics specified by the filter. Data packets are captured or counted by the RMON probe on the basis of a match (or a failure to match) with the specified recognition characteristics. Filters can be combined using logical "and" and "or" operations to define a more complex filter to be applied to data packets, thus focusing the screen onto a narrower group of data packets. Data packets that pass through the complex filter are selected for capture or counting and are referred to as a channel.
Packet monitoring using probes (as shown in FIG. 1) is problematic when data switching is used in network 100. Assume a user issues a request data packet (not shown) from client computer system 110c that is routed through communications line 120 to server computer system 130c. RMON probe 140 observes the request data packet and in this case, because of the filter specified, captures and counts the data packet. Server computer system 130c responds to the request data packet and transmits a response data packet (not shown). However, because of increased traffic on communications line 120, the response data packet is more efficiently routed back to client computer system 110c through communications line 122 and is observed by RMON probe 142. Because of the filter specified, RMON probe 142 also captures and counts the data packet.
In the prior art the RMON probes are only capable of making a count of the number of filtered data packets, which provides only a limited measure of the performance of the network. Thus, one drawback to the prior art is that, because of the nature of switched networks, a data packet may take one route from a client computer system to a server computer system and a different route back. Therefore, the packets are never correlated because they are counted by two different probes and each probe operates independently.
For example, the network manager would expect that the number of filtered response data packets and filtered request data packets would be equal, and if not, this would provide an indication of a potential problem on the network. However, this information only indicates the reliability of the network for carrying data packets, or the reliability of a server computer system to respond to a request, but does not provide a measure of the time it took to respond to the request. Therefore, another drawback to the prior art is that it does not measure performance times such as application response time, application processing time, or network latency, because packets might not be correlated if they are captured by different probes. Thus, in the prior art the network manager or a user does not have the desired information regarding the average and worst-case performance times. Hence, another drawback to the prior art is that the network services provider cannot readily demonstrate compliance to the governing SLA.
With reference again to FIG. 1, it is possible that, after the response data packet passes RMON probe 142 and is counted by RMON probe 142, a fault on communications line 122 may occur so that the response data packet is not delivered to client computer system 110c. For example, a failure of switch 125 may occur so that the response data packet is not able to complete its journey. However, in the prior art the response data packet may still be counted as a successful transaction. Thus, a disadvantage to the prior art is that a fault in the network may not be detected by the network monitoring software, and would only be eventually noticed by the user who did not receive a response to his/her request. Another drawback to the prior art is therefore that a fault in the network may not be noticed in a timely manner. An additional drawback to the prior art is that the accuracy of the performance statistics may be affected by the location of the RMON probes.
One prior art system attempts to address some of the disadvantages identified above by incorporating RMON into routers or switches instead of a probe, and adding a plurality of these components to the network. However, a disadvantage to this prior art system is that the speed at which the component (e.g., a switch) performs its primary function is significantly slowed by the addition of the network monitoring function, because of the complexity of RMON MIBs and the application of multiple filters. In addition, another drawback to this prior art system is that the cost of the component such as a switch is substantially increased by the incorporation of the RMON facilities. This prior art system also does not address the other disadvantages identified above, such as the inability to measure performance times and demonstrate compliance with SLAs in a switched communication system.
Accordingly, a need exists for a method to monitor a computer system communication network that readily and quickly detects and identifies a degradation of the network. A need further exists for a method that accomplishes the above and enables the network manager to demonstrate compliance with the provisions of the governing SLA. A need yet exists for a method that accomplishes the above and also provides an accurate measure of the network performance as well as its reliability. Finally, a need exists for a method that accomplishes the above and is cost-effective and compatible with the SNMP protocol that is currently employed. The present invention solves these needs. These and other objects and advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.