The present invention relates, generally, to a system and method for diagnosing the operational behavior of a computer system, especially, but not only, those running Internet applications and telecommunication applications. In particular, the present invention relates to diagnosing computer system operational behavior in terms of both application-level operation and system-level operation.
In many computer systems, pre-production testing of computer systems, including application-specific computer systems like web servers, is critical to avoiding unexpected and expensive operational problems once the computer systems are deployed and relied upon. General areas of concern include processing bottlenecks and resource use problems (for example, memory and CPU usage). In particular, systems with varying loads (i.e., levels of use) or long-term load carrying may be susceptible to operational problems at higher loads or after operation for long periods of time, although such problems may not be evident in small-scale pre-production testing. A particular, although not exclusive, example of this concept is recognizable in web servers, where the loads on a web server (for example, client-sent file requests) may vary widely according to the number of users accessing the server. Also, extended periods of high server loads may occur (for example, on an e-commerce web server during a peak shopping period).
Conventional computer system testing systems are known for assessing computer system operation. However, conventional testing systems usually perform only a few functions, and do not provide a comprehensive diagnostic package.
For example, load test tools such as LoadRunner(copyright) are commercially available from Mercury Interactive of Sunnyvale, Calif. The LoadRunner(copyright) test tool is software-implemented and provides a mechanism for running load testing, collecting results, and reporting the collected results. However, it does not select or otherwise help to select specific parametric information to be collected or specific tests to be run. The computer system parameters measured are usually predetermined and not user-selectable. Thus, a system engineer or the like must be able to recognize which test data are of significance.
Similarly, conventional software-implemented application data collection tools are available, for example, from Keynote Systems Inc. However, the Keynote tool is limited to testing/monitoring web sites by deploying agents worldwide that test the client-level experience with respect to a given web site. The Keynote tool only measures user-level response times and throughput, and user-level errors (such as http error codes). It does not monitor a computer system at the server level, nor does it provide any diagnostic functionality. A user must independently decide what tests should be run and must interpret the results.
SE Toolkit is a conventional freeware performance analysis package. SE Toolkit provides system-level diagnosis tools but does not analyze user-experience parameters at an application level. Its primary focus is the computer systemxe2x80x94specifically, operating system resource parameters. Analysis of the system parameters is limited to flagging system parameters whose values exceed predetermined thresholds. Some problem-solving functionality is provided, but it is limited to relatively simple problems corresponding to measurement of a single parameter. SE Toolkit provides a language interpreter for creating or editing measurement rules.
Quest Software, Inc. of Irvine, Calif. offers a software package called Spotlight(trademark) that graphically displays server processes and data flow in real time. However, a user must identify congestion and bottleneck problems and decide what action to take. Spotlight(trademark) is application-specific, so a given version of Spotlight(trademark) is limited to analyzing server activity when running a specific application.
Quest also offers a software package called Foglight(trademark) for monitoring application-wise use of system resources. In a particular embodiment, specific software xe2x80x9ccartridgesxe2x80x9d are provided for specific applications to be monitored. Thus, general monitoring of applications with the Quest cartridges is not possible.
In contrast to the foregoing, the present invention relates to a diagnostic system for a computer system and a method for diagnosing computer system operation. An integrated treatment of both application-level and system-level operation with respect to a plurality of operational parameters is provided. As a result, the present invention can, for example and without limitation, identify performance problems, identify causes of the performance problems, make recommendations for overcoming performance bottlenecks, and provide some form of an index or relative score indicative of each operational parameter that has been examined. The present invention also can analyze the scalability of a computer system.
The present invention uses one or more of load and no-load test runs, time dependent data (such as soak testing), and empirically obtained information (such as field performance data). The input data may be either numerical parametric information or it may be qualitative descriptions of operational behavior that may be internally reduced to quantitative values.
Generally, the present invention includes a method of analyzing the behavior of a computing system. The term xe2x80x9cbehaviorxe2x80x9d may sometimes be used interchangeably with the term xe2x80x9csignaturexe2x80x9d herein. The method includes identifying an undesirable computing system behavior and identifying its cause. Thereafter, the method includes recommending at least one of an application-based solution and a system-based solution to remedy the undesirable computing system behavior.
By developing a systematic, reusable approach to analyzing performance and scalability, the present invention significantly reduces the amount of time and resources required to gather and analyze the necessary performance data. The invention can be used effectively for any transaction-based application or operations support system. Examples of transaction-based applications include but are not limited to electronic mail, web servers, instant messaging, directory engines, web search engines, and single sign-on engines. Examples of operations support system applications include but are not limited to provisioning, trouble management systems, data reporting systems, and database information retrieval. The invention does not require a user to know software details of the application. In fact, the internal details of software are often hidden from users, so the invention assesses performance and scalability through empirical methods. The invention is particularly useful for applications that involve database access or applications that are coded using C, C++, Java or Corba, although it is not limited in scope to these applications. These advanced technologies are often used to facilitate rapid development and maintainability, sometimes at the expense of performance and scalability. Ideally the invention should be utilized during the performance testing cycle (before deployment) but it can also be utilized by applications that are already in the field.
In an example of a method according to the present invention, analyzing computer system performance may include applying a test sequence to a computer system running an application, collecting test results from the test sequence that reflect computer system performance, and identifying a characteristic computer system behavior based on the collected test results. On the basis of the identified computer system behavior, the method calls for either matching the identified computer system behavior with a reference computer system behavior having a known cause or, if the identified computer system behavior does not match a reference computer system behavior, recommending further action to obtain additional information usable to refine the identification of the characteristic computer system behavior. A computer-readable medium having computer-executable instructions for carrying out this method may also be provided.
In conjunction with the foregoing, the present invention also relates to a method for obtaining information for use in a computer diagnostic system. Such a method includes running one or more of a scalability performance test and a soak performance test on the computer system and extracting resultant parametric information reflecting both application behavior and system behavior during the test(s).
Another method for obtaining information for use in a computer diagnostic system includes posing a first prompt for eliciting a qualitative description from a user of an operational behavior of a computer system, selecting a second prompt for eliciting an additional qualitative description of the operational behavior of the computer system based on the user response to the first response, and posing the second prompt. Eliciting a qualitative description in this manner can be beneficial because computer system behavior can be described in a simple manner, such as describing a graphical representation of CPU utilization (xe2x80x9cslowly ramps to a constant level, then suddenly drops to a near-zero levelxe2x80x9d).
A diagnostic system according to the present invention includes at least one input interface for receiving parametric information about a computer system, an information processing unit that receives and processes the parametric information from the at least one input interface, and an analysis engine that compares the processed parametric information with a plurality of reference computer system behaviors so that a specific behavior of the computer system can be identified and so that a solution to overcome the identified computer system behavior can be provided.
The present invention can be arranged as a client-server network, including a client for receiving parametric application-level and system-level performance information about a computer system, and a server for analyzing the parametric performance information received from the client and identifying an undesirable computer system behavior based on the parametric performance information, whereby the server is at least occasionally connected to said client.