1. Field
The present invention relates to network management. More particularly, the present invention relates to a method, system, and data structure for monitoring transaction performance in a managed computer network environment.
2. Background Information
Information technology (IT) network management operators face technical challenges to identify the cause of a network event that can impact the quality of service available to users (or clients) in a managed computer network. To identify the cause of a network event, or perhaps to take steps to prevent an event from occurring, operators monitor certain performance parameters of the network. Performance parameter measurements used by operators include the availability, the response-time, and the volume (e.g., throughput) of the services provided in the network.
One challenge in measuring and monitoring network performance can be gathering enough data to be able to diagnose the cause of a network event, while not gathering so much data that the measurement system itself impacts the network performance, or that operators cannot efficiently identify network failures in a timely manner.
Network performance monitoring systems fall into two broad categories—those that are intended for use by software developers in a laboratory (or testing) environment and those intended for use by IT operators in a production network environment. The systems used by software developers trace process (or transaction) execution flows through a network while a test is being performed. Once the test completes, analytical tools use the trace data to produce information for optimizing the network performance. These tools can measure and produce significant amounts of data, since a high degree of measurement overhead can be tolerated in the non-production test environment.
High degrees of measurement overhead may not be acceptable in a production network environment. Consequently, production environment measurement systems tend to limit the number of network parameters measured to two or three parameters, such as the number of transactions that occur over a given period of time and the total time those transactions require to execute. Production environment monitoring tools then sample the measured data to produce statistical averages. Diagnosing the cause of network event using these statistical averages can be challenging, as the statistics do not provide detailed information on the transaction execution flows themselves. Moreover, the diagnosis often takes place “off-line” using post-processing analytical tools, which can add to the time required to determine the cause of a network event.
Production environment measurement systems measure response time from an end-to-end perspective, but the decomposition of the measured response time can be limited to observable events that occur at the client site, such as a connection setup time or a document load time. The ability to further decompose end-to-end response-time by observing and correlating related network events on back-end systems, such as web-servers, application-servers, and database-servers, can aid operators in identifying and correcting network failures.