Monitoring and evaluating operation and performance of computer systems, networks and the like may be important for troubleshooting problems and evaluating ways of improving the operation or performance of the system or network. A typical system 100 for monitoring performance of different domains in a system or network is illustrated in FIGS. 1A and 1B. The domains may include host machines or processors 102, each running an operating system 104, application programs 106 operating on the host machines 104 and similar domains associated with a larger enterprise system, distributed network or the like. The hosts 102 may be web servers, component servers, application servers, database servers or the like. The operating system 104 on each host 102 may be a standard operating system such as Unix, Windows or the like.
A number of applications 106 may run of each host 102. The applications 106 may be divided into those applications 108 that are already instrumented or may be capable of being instrumented to collect performance data or metrics and those applications 110 that are incapable of being instrumented. Instrumentation involves the insertion of performance gathering code or data structure within the software of an application, operating system or the like. Instrumentation may be done when the software is originally written or may be inserted later. Operating systems are typically instrumented at the time they are written or developed.
An application agent 112 may be associated with each application 108 that is instrumented to gather performance data. The application agent 112 may collect the performance data associated with the application 108 in which the application agent 112 may be embedded. The application agent 112 may transport the collected data across the network to an application management station 114 for analysis and storage. The application agent 112 and management station 114 are usually proprietary to the vendor providing the tools. Accordingly, the application agent 112 must typically be used in conjunction with the management station 114 provided by the same vendor. Additionally, each vendor typically specializes in a specific domain and provides agents only for that domain.
Application agents 112 transmit the application performance data using Transmission Control Protocol (TCP) to the application management station 114. The TCP connection oriented protocol can utilizes significant resources of the associated application 108 compared to a connectionless protocol, such as user datagram protocol (UDP) or the like. TCP also creates additional dependencies or burdens on the startup of the application 108. Additionally, application agents 112 may not be able to be remotely controlled to alter the level or type of statistics or data being gathered or the frequency at which the data is gathered. Even if the operation of application agent 112 may be altered, such change may necessitate stopping the application 108 to make the change and then restarting the application 108.
A user may access the performance data on the application management station 114 via a proprietary viewing console 116 that is usually supplied by the same vendor as the application agent 112 and management station 114. Multiple consoles 116 may be provided for simultaneous access by multiple users or workstations 120. Each user may also require a vendor specific client program 118 on his workstation 120 to communicate with an associated one of the proprietary consoles 116.
A system agent 122 may be associated with each host 102 to gather data regarding performance of the host 102, operating system 104 and any network associated with the host 102. The system agent 122 may not be associated with an intermediate data storage device and may be directly connected to a proprietary viewing console 124. There may be multiple instances of the proprietary console 124 for access by multiple users or workstations 120. The system agent 122 may be used to resolve performance bottlenecks on a real-time basis. Communication between the system agent 122 and the proprietary console 124 may use Simple Network Management Protocol (SNMP) or TCP, either of which consume data processing resources of the host 102. The user may also need another vendor specific client program 126 to access one of the consoles 124 and retrieve or view the data.
The vendor of the operating system 104 may also provide native system monitoring tools including a native system agent 128 to collect performance statistics related to operation or performance of the host 102, operating system 104 and any network to which the host 102 may be coupled. The native system agent 128 may transfer any collected performance data to a local file system 130. The native agent 128 may collect data in the same address space as the process or operation being monitored and write any collected data directly to the local file system 130. Accordingly, no inter-process communication or protocol may be required. Another client program 132 may be needed on the user's workstation 120, however, to access the collected data on the local file system 130.
Another system agent 134 from a third party vendor may also be associated with each host 102 and associated operating system 104. The agent 134 may be an extensively featured agent and may include other packaged software tools for data collection, trend analysis and modeling. All of which can consume host resources. Like other system agents, such as agents 122 and 128, the agent 134 only collects operating system, host and network data and does not collect application level metrics. The system agent 134 may transmit the collected data to a proprietary central management station 136 provided by the same vendor. The communication link between the system agent 134 and associated management station 136 may use multiple different protocols, such as TCP, SNMP, File Transfer Protocol (FTP) or a vendor proprietary protocol. Either of these protocols can utilize considerable overhead or data processing resources of the host 102.
The central management station 136 may transfer the collected data to a proprietary console 138 for real-time access by a user or to a proprietary file repository 140 for storage and further processing or analysis. There may be multiple instances of the proprietary console 138 for access by multiple users or workstations 120. Another vendor specific client program 142 may be needed on the user's workstation 120 to access the data via the proprietary console 138. Communication between the central management station 136, console 138 and client program 142 may be TCP or a vendor proprietary protocol.
The file repository 140 may store the collected data in a vendor proprietary format. The vendor may provide tools to export the data to a standard relational database (RDB) 144. Communication between the central management station 136, proprietary file repository 140 and relational database may be TCP or FTP. Exporting the data to relational database 144 and the use of TCP and FTP can utilize significant data processing resources.
Each of the system agents 122, 124 and 126 may be needed to collect certain data or metrics or to analyze and present the collected data in a particular way. Accordingly, there may be redundancy in the data collected. Additionally, the resources of the host 102 utilized by the multiple agents 124-126 running concurrently can be significant.
In summary, current performance monitoring and analysis systems may be complex requiring multiple components or tools for a user to retrieve, store and present performance data from different domains, such as applications, operating systems, hosts, networks and other domains. The multiple tools may come from an array of different vendors and utilize significant processing resources. There is no mechanism to integrate and consolidate the performance data collected by the different vendor tools and the data may be redundant and stored in inconsistent formats. Further, the data collection agents are incapable of being controlled dynamically and require an application or operating system domain to be shut down and restarted to alter the operating parameters of the agents. The multiple, different proprietary viewing consoles and client programs on each user's workstation 120 can impose administrative constraints and requirements, such as maintenance, multiple user licenses and training to use and maintain the tools.
Accordingly, there is a need to provide a system and method to monitor performance that utilizes minimal resources and can integrate or consolidate and display the data collected from different domains simultaneously. There is also a need to provide a system and method to monitor performance that permits dynamic control of the tools without affecting the operation of the different domains. There is also a need to provide a system and method to monitor performance that uses a standard system-wide database for storing collected performance data and stores the data in a standard format. There is a further need to provide a system and method to monitor performance that uses tools written in a standard programming language to collect, analyze and present the collected data to minimize administrative constraints and requirements.