Information systems environments continue to grow more and more complex, especially in situations where a number of software applications are executing on the same network of servers, utilizing shared resources such as web servers or database servers to affect their respective functions. Moreover, a given application server may be dedicated to the execution of a number of software applications simultaneously, so that operating system resources are shared. The recent rise in popularity of virtual machines, wherein a single computer hardware platform simultaneously runs a plurality of operating systems, has certainly helped isolate computer software applications from one another, at least in their interaction with the operating system. But the given software applications may interact with similar resources on the network to which the application server is attached including its own CPU and memory subsystems, web servers, database servers, storage resources and network resources. System managers must understand the current performance of their applications in order to maintain quality end user performance, to be able to identify and predict future performance problems, to evaluate potential solutions to those problems and to proactively upgrade the systems.
To aid the system manager in this process, some software applications are instrumented to measure and report resource consumption for each transaction that is performed by the application. A transaction is typically an exchange of data with a given server or a given device with a well-defined beginning and end. The number of transactions per unit time will be referred to as the “load” or as the “transaction throughput” interchangeably in this document. If the transaction resource consumption data is logged sequentially in a file with the date and time that each transaction performed, one can analyze such data to generate a model that may be used to closely replicate the behavior of the software application. Two types of models commonly used in the art of information systems analysis are discrete event simulation models and analytic models based on queuing theory.
There is a significant drawback to the method of using resource consumption data from the given measured application: the generated model will only reflect resource consumption reported by the component of the given application that is measured. Resource consumption by unmeasured components of the application on the application server or on other servers will not be included in the generated model. Typical measured resource consumptions on the application server are the central processing unit (CPU) time, the storage disk bytes read and written, and data bytes read and written. As the combined application load for all applications on the server grows so will total resource consumption of all applications on the system of servers and the resources of all the components of the system will be drawn down. Application load then affects the performance of resources not measured by the given applications and skews the generated model. The usage of resources by applications other than the given application is referred to as the background load throughout the rest of this document.
The present invention can be used to determine the background load imposed by multiple other applications that were executing on the same hardware infrastructure during measurement. It is especially useful if the measurements are taken from a system in production that is shared with the other applications as they normally run.
The present invention incorporates resource utilization data collected by one or more system monitors such as the HP OpenView Performance Agent (available from Hewlett Packard Corporation), Microsoft Performance Monitor (available from Microsoft Corporation), or Unix System Activity Report (sar) into the generated model, so that simulation of unmeasured application components are either included in each transaction, or a background load is estimated and added into the model.
A prior art example of a system of servers containing an application server configured to measure its own resource consumption is shown in FIGS. 1A and 1B. The system of servers 10 of FIG. 1A comprises an application server 20 (or a set of application servers), a set of database servers 30 and a set of web-interface servers 40 interacting with a user 50, which may be human or machine. The servers included in system of servers 10 are connected together by a network indicated by the lines drawn between them. In FIG. 1B, software application 60 runs on application server 20 as do a set of other software applications 65. Software application 60 and set of other software applications 65 utilize resources contained within the application server 20, the resources being one or more CPUs 70, random access memory (RAM) 72, local data storage (DISK(s)) 74, and data pipes 76. A data pipe is a useful communications construct in various operating systems; examples of a data pipe are a data path between IP addresses on a network, a data path between two applications on the same server, or a data path to the display device.
User 50 interacts with application 60 on the application server to operate in production or to run a set of load tests on application 60. User 50 accesses and operates application 60 through web-interface server 40. While operating, application 60 measures resource consumption of the resources and writes the results of the measurements to transaction log 80.
Transaction log 80 containing measured transaction resource consumption data is comprised of a set of transactions generated by a given application and typically contains information as follows in the example shown in TABLE 1:
TABLE 1Transaction LogDiskDiskDataDataDate/TransCPUBytesBytesBytesBytesTimename(secs)ReadWrittenReceivedSent1/1/06Buy1.110383847550691239312:00:001/1/06Sell0.838475839273781726712:00:051/1/06Buy1.210573874651821258212:00:06
The transaction log organizes transactions into rows, each transaction characterized by a set of data organized into columns. In TABLE 1, the transaction log contains a Date/Time column which contains a time stamp of when the data for a given transaction was measured, a Transaction Name column which contains an alphanumeric descriptor of each transaction, a CPU consumption column containing CPU usage for each transaction, the number of Disk Bytes Read during each transaction, the number of Disk Bytes Written during each transaction, the number of Data Bytes Received from all data pipes during each transaction and the number of Data Bytes Sent to all data pipes during each transaction.
The transaction log from a load or stress test of an application or from an application deployed into production can have hundreds or even thousands of transactions during a measurement period of a few seconds.
Transaction logs may be generated by a software application for reasons other than performance monitoring, such as auditing, accounting, or recovery. But for measuring performance, the logs must contain a date/time stamp, the type and number of transactions performed (at that date/time), and the resource consumption relevant to the desired model.
Some software applications such as SAP have an embedded mechanism to add an extension to transaction processing, which is invoked at key points such as the completion of a transaction. If the required information is available at that time, this extension can be used to generate a transaction log without otherwise making changes to the application or underlying subsystem.
A discrete event simulation model can simulate each transaction as reported, at the time that it occurred. A more abstract model can be built by summarizing individual transactions into an average resource consumption per transaction, and simulate the average or peak throughput observed during the measurement period.
FIG. 2 shows a prior art performance improvement method 100 in which a generated model is used by the system manager to improve performance of an application in system operating a plurality of applications or services. A system of servers 105 is operated by user 101 to run a software application (not shown) residing on the system. The application measures transaction data and records measured transactions into transaction log 108. Transaction log 108 is manipulated by a system manager 102 to generate a discrete event simulation model 110 for a given system state different than existing state for system 105 when transaction log 108 was recorded. The given system state is often one in which the application is loaded with more users than in the existing state. Discrete event simulation model 110 outputs a predicted set of resource usage parameters 112 such as CPU utilization or network bandwidth required. Predicted resource usage parameters 112 are then used by system manager 102 to evaluate the performance of system 105 in the given system state and to make hardware or software adjustments on system 105 accordingly to prepare it for the given system state.
Transaction log 108 of the prior art may not include all of the resource consumption associated with a transaction. Consider again the simple 3-tier system of FIGS. 1A and 1B. Transaction log 80 generated by application server 20 would include only the resource consumption for that server. It will omit CPU time, for example, consumed by one of the database servers 30, even when the server is called by application 60. There is insufficient knowledge to accurately model the performance of application 60, especially in a situation when database server 30 is heavily loaded by requests from other running applications, many of which may be anonymous or do not generate their own transaction log. In modern systems, a plurality of applications will typically run on a plurality of real or virtual machines.
There exists a need in the art of systems management for a performance tool to aid the system manager in the common situation where there is incomplete knowledge of application resource usage by applications other than the application of interest.
There also exists a need for a method for measuring and combining system wide utilization data with the application transaction log to provide accurate performance models. More specifically, by making an assumption that the transaction throughput derived from the application transaction log is also representative of the transaction throughput for all other system components of the application (e.g. web server and database server in FIG. 1a), throughput data can be combined with resource utilization data collected during the same period from the other system components to estimate unknown transaction costs.
There also exists a need for a method by which data from the application server alone can be gathered and combined with the application transaction log to make a significant improvement in the performance model.