Modern enterprise applications are characterized by multiple components deployed across multiple network tiers (sets of computers) accessed by users across a network. Examples of enterprise applications include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Supply Chain Management (SCM), and Online Banking, Brokerage, Insurance and Retailing. An enterprise application typically provides a variety of business functions that users may execute. For example, an online stock trading application may provide some of the following business functions: log in, display account status, retrieve stock prospectus, sell stock, buy stock, log out.
When a user executes such a business function, a sequence of transactions is performed by the enterprise application, with each transaction consisting of a source component transmitting a request (via network message) to a destination component, often on another tier, and perhaps waiting for a reply message. The destination component processes the request and in the processing consumes local (server) resources such as cpu, disk input/output, and memory and may generate subsequent requests (subtransactions) to other components.
The time that elapses between the user executing the business function (submitting his or her request) and the display of the results on the user's workstation is called the end user response time. The end user response time is typically the most critical measure of end user satisfaction with application performance. If the response times are too long, end users will be unsatisfied and many will take their business elsewhere.
In order to maintain and improve good end user performance, application and system managers must understand the current performance of their applications, be able to identify and predict current and future performance problems, and evaluate potential solutions to those problems.
In the prior art, complex systems in general and enterprise applications in particular have always been managed in part by rules of thumb. These rules derive crude solutions to common problems. For example: if server utilization exceeds 67%, upgrade the server computing (CPU) capacity. The manager of such a system or application obtains such rules of thumb from the system and application vendors, personal experience, training and research.
Unfortunately, such rules of thumb are highly unreliable for complex systems whose behavior is difficult to understand and predict such as enterprise applications. Such rules can suggest solutions that are expensive and ineffective and even counter-productive. For example, upgrading the server in the example above may be completely unnecessary to obtain good performance and may even degrade the performance seen by some application users.
Over the years, system managers have improved upon rules of thumb for performance management of enterprise applications by monitoring the performance behavior of production applications. Monitoring refers to the collection of performance data as the application executes in the production environment. Monitoring tools only provide a subset of the data necessary to conduct an analysis of the performance of an enterprise application
The performance data necessary to conduct such an analysis includes the following:                Workload                    The number of users, what functions of the application they are using, and how frequently they execute such functions                        Application Workflow                    The flow of transactions (or messages) among components of the application that occur when a particular business functions is executed by a user                        Resource Consumption                    The resources consumed by the process of each transaction, such as the following:                            CPU                Disk input/output                Memory                Request and reply message sizes                                                Hardware and System Topology (Infrastructure)                    The location, configuration and interconnection of all the hardware and system components                        Deployment                    The assignment of application components to infrastructure components            The configuration of application components (e.g., number of threads, pool sizes, load balancing algorithms)                        Performance Measures                    End user response times and throughputs            Server, interconnect and data link utilizations            Queue lengths                        
The granularity at which such data is collected is critical to the usefulness of the data. For example, the disk input/output data is often collected in terms of total numbers of reads and writes, total read bytes and total write bytes that occur during the monitoring period. Unfortunately, the performance analyst needs to see a breakdown of that input/output by application, process (application component), service, and transaction.
In addition, it is particularly important to know the transaction workflow (the sequence of messages that result in the user's execution of a particular business functions) and the resources consumed during the processing of each transaction. Unfortunately, most network monitoring solutions available today report data at either gross aggregations of millions of bytes transmitted during a monitoring period or at the packet or frame level of granularity, whereas a message is typically composed of a number of packets or frames. In addition, the monitoring data is typically collected separately for each network segment or tier and the sets of data from the multiple segments or tiers is not correlated. So, it is very difficult to reconstruct the transaction workflow (sequences of messages corresponding to a particular business function) from the monitoring data.
In the prior art, some understanding of current application performance through monitoring could be obtained. However, the monitoring data from product systems will not by itself identify future performance problems or good solutions to current problems.
To successfully manage an enterprise application, one must understand not only its current performance but predict its performance under different possible future situations. One must be able to answer questions such as the following:                When will my current application break under increasing load?        What will be the bottleneck device at that time?        What changes to the infrastructure or application configuration will alleviate the bottleneck?        Which of these possible changes will yield the best performance at the lowest cost?        
There are several prior art techniques that have been developed to make such predictions, including trend analysis, load testing, analytic modeling and predictive simulation, which are described next.
Prior art trend analysis allows performance analysts to make predictions by analyzing trends taken from measurements of application performance under differing load, from either a production system or test lab. For example, if the average end user response time is R at load L and is roughly 2R under load 2L, one might infer a linear trend and project a response time of XR under load XL for any X.
Simple trend analysis has not been very successful for modern enterprise applications, even when much more sophisticated trend analysis techniques have been used because such applications are often highly nonlinear. In addition, even when trend analysis predicts a performance bottleneck at a future load, it cannot predict the best solution to that bottleneck. Clearly trend analysis is an inadequate predictive technique.
Prior art load testing has also allowed performance analysts to make predictions. To understand how the performance of an application scales and otherwise behaves under increasing and varying load, many system managers configure a test version of the application in a laboratory and drive the application with an artificial load (a simulated set of users). The load is varied and the performance of the application is measured for each load. This approach is known as load testing. With it, one gains several advantages above and beyond rules of thumb and monitoring, including the following:                Measurement of the performance of the application (as configured in the test lab) in response to increasing and varying load.        Evaluation of the performance of different application configurations and infrastructures (hardware and system software) by implementing those configurations and infrastructures in the lab, load testing and measuring them.        
Load testing has many drawbacks, including the following:                It is difficult, expensive and time-consuming to configure a laboratory installation identical to the production one, because of the complexities and subtleties of modern enterprise applications and infrastructures. As a result, the test environment is often significantly different from the production environment and the predicted performance of the production system must be inferred from the test environment measurements.        At best, load testing can identify potential future bottlenecks and other performance problems that may result under increasing or varying load, but cannot identify the solutions to those problems.        It is prohibitively expensive and time-consuming to load test all potentially good configurations to improve or optimize performance, since those configurations often require expensive additional equipment or very time-consuming reconfigurations of the application components.        
Load testing is inadequate as a comprehensive planning method. One technique that overcomes some of the time and expense in load testing alternative application and infrastructure configurations is prior art analytic modeling.
In analytic modeling, a set of mathematical equations relating model inputs to performance outputs is derived and solved. For example, consider an M/M/1 queue, which has a Poisson arrival process, a Poisson service process, and a single first-come-first-serve server. The average response time, R, of such a system is given by the following equation:R=S/(1−S/I),                Where                    S=average service time            I=average interarrival time                        So, if S=2 seconds and I=3 seconds, then R=2/(1−2/3) seconds=6 seconds.        
If an accurate, flexible, analytic model of a enterprise application could be constructed, then quick and inexpensive productions of performance of applications could be made under varying future conditions.
Unfortunately, it is difficult to construct accurate analytic models of the simplest modern computing environments. The size and complexity of modern enterprise applications and the fundamental limitations of the analytic modeling technique make the analytic approach far too complex and inaccurate for most important problems.
A superior prior art technique is predictive discrete-event simulation. In a predictive discrete-event simulation, a mathematical model is created that simplifies the enterprise application simulation model as follows:                The model maintains the following data structures:                    The current simulation time (clock)            The current state of the system being modeled (e.g., where the transactions are, which resources they possess, the status of their outstanding requests for additional resources, and the queues of such requests)            A list of pending events known to occur in the near future, maintained in time order                        A master event monitor drives the simulation model as follows:                    The next event on the pending event list is removed to become the current event (e.g., a transaction arriving or departing from a queue)            The simulation clock is advanced to the time of the current event            The state of the simulation is updated to reflect the occurrence of the event (e.g., the transaction location is updated to show the departure from or arrival to a queue, or resources are released from or allocated to the transaction)            New events are posted to the event list if appropriate (e.g., if a departure event is simulated, an arrival event at the next queue is typically placed on the event list)            If the simulation clock has not reached the ending time, the master event monitor begins again with first step above (removing the next event from the event list)                        
Discrete event simulation is a highly general and flexible technique. It applies to any system of discrete resources and transactions, regardless of complexity. Discrete event simulation is particularly effective in the representation of contention for resources—a key performance characteristic of complex systems. Therefore, it is a sufficient foundation for accurate prediction of the performance behavior of enterprise applications. The difficulty in applying this prior art technique lies in collecting data, analyzing data, and constructing the models. Traditionally these steps have been performed by hand and are error prone and time consuming.
The current invention focuses on automating these steps, reducing the errors and reducing the time required to complete a system analysis.
“Network performance management” refers to the performance analysis of computer networks such as those underlying enterprise applications. It includes reactive techniques, which identify current problems and react to them, as well as proactive techniques, which attempt to identify problems before they occur and avoid them through proactive corrective action.
One of the proactive techniques used in network performance management is discrete event simulation. Unfortunately, end users of an enterprise application may see poor performance even when the network performs well. Performance problems may exist with servers, middleware, databases, application configurations and other system components.
So, although network performance management using discrete event simulation is a major improvement over load testing and analytic techniques, it is inadequate as a comprehensive approach to enterprise application performance. A better approach, used by this invention, is based upon comprehensive “enterprise application modeling”.
In order to create and maintain user satisfaction with enterprise application performance, one must predict the performance of such applications under varying possible scenarios, identify performance problems that occur in such scenarios and identify the best solutions to those problems.
To predict the performance seen by end users of a modern enterprise application using a simulation model, all the components of the application and infrastructure that affect end user performance must be represented. These components include the following:                Client Behavior: The clients, the requests they make of the application and the pattern and frequency of those requests.        Application Architecture and Behavior: The application components, their interaction (in particular, the sequences of requests exchanged and processed in response to a business function request) and the resources consumed when processing user requests.        Infrastructure: The internet, LAN, server, middleware, database and legacy components, interconnection and configuration.        Deployment: The assignment of application components to infrastructure components and the configuration of those application components.        
Since the performance seen by end users of enterprise applications may depend upon any and all of these items all of them must be included in a performance model to accurately predict end user performance.
The current invention incorporates each of the above components in its performance models. Until the current invention, the data collection, data analysis, model generation and performance project management activities have been difficult, error prone, and time consuming.
This invention also enhances enterprise application modeling and applies a disciplined approach of load testing, data collection, automated data analysis, automated model generation and discrete event simulation to reduce the time required to produce and increase the accuracy of enterprise application modeling.
The invention requires raw performance data collected for use in an enterprise application performance project which includes network traffic, server data and application data. A variety of prior art products collect such data.
The collected data typically consists of a large number of low-level data files in varying formats that are not correlated or synchronized. The data must be analyzed for the following purposes:                To verify that the data was collected correctly.        To eliminate the large amount of extraneous data.        To raise the abstraction level of the data        To correlate the data obtained from the various sources        To recognize the sequences of messages (or transactions) forming each business function        To derive the resources consumed when processing each transaction        
In the prior art, data analysis is typically performed manually, using statistics packages and spreadsheets, an error-prone and time-consuming approach. This invention provides a semi-automatic solution to the general case of enterprise application performance data analysis.
After the data has been analyzed, a model must be created in order to use the data for predictive simulation. In the prior art, model creation is performed manually. A variety of modeling tools exist in the prior art, such as HyPerformix Infrastructure Optimizer™, SES/Workbench™ and Compuware Application Predictor, for creating models of computer hardware, software and networks. With these tools, a user constructs a model using a drag-and-drop GUI and may be able to import some collected data. However, the process of building the model is still error-prone and time-consuming, because of the following factors:                Modern enterprise applications consist of a large number of components inter-related in complex ways. Models of such applications need to represent these components and their relationships and so tend to be large and complex.        Most tools used to model enterprise applications do not contain adequate built-in domain knowledge of enterprise applications. For example, the user may need to program the concept of a “business function” as a sequence of inter-component transactions.        The user often has to program sequences of actions in an unfamiliar modeling language rather than simply declaring the attributes of the system with a familiar GUI.        The modeling language is insufficiently focused and general, thereby adding complexity and confusion to the process of data collection and data analysis to support the abstractions available in the modeling language.        
The current invention addresses these factors as follows:                It automatically generates models from automatically analyzed data, thereby greatly simplifying the process of model creation.        It contains built-in knowledge of all the component types common to enterprise applications.        It provides an intuitively familiar and declarative rather than procedural GUI based upon spreadsheets.        The modeling user interface is based upon a minimal parameter set for characterizing the performance of enterprise applications, thereby simplifying the process of data collection, data analysis, and model creation.        
Finally, a performance modeling project consists of a sequence of steps such as the following:                Load testing a laboratory configuration of the application.        Collecting raw performance data from the application under load in the test laboratory.        Analyzing the raw performance data to derive a higher level representation of the application performance behavior.        Constructing a base performance model from the analyzed data.        Executing the base performance model and comparing its predicted performance to the measured performance data to validate the model.        Using the validated model to predict future performance problems and evaluate potential solutions through a set of what-if experiments.        
Each of these steps may involve a large number of sub-steps and complex data manipulations and produce a large number of data files. In the prior art, the user must take all of the required steps in the correct order and use the correct input files to arrive at a usable model.
The current invention simplifies this process and reduces errors by providing a graphical centralization of all the steps and input/output files.