The performance of large computer networks and servers and the distributed applications run on them is an area of considerable interest to the global economy as businesses become more diverse and applications more complex. In order for network systems to remain reliable and available, system performance must be constantly monitored and tested. Additionally, maintaining performance during expansion of a network or the deployment of new servers and applications can be a considerable task.
Modern software applications are characterized by multiple components residing on multiple clients and servers or “tiers” connected by a network. Often a single network can support thousands of clients and servers and be widely geographically dispersed. These networks are known as “multi-tiered systems”. In many cases, a multi-tiered system includes use of the Internet to request and receive data for enterprise applications.
An enterprise application typically provides a variety of business functions that users may execute. For example, an online banking application may allow a user to access databases at a bank and manipulate data to check account status or transfer funds between accounts. The user's task is known as a business function.
When a business function is executed, a sequence of transactions is performed by the enterprise application operating on the components on the network. Each transaction consists of a request for data (or “stimulus”) and a reply. The request takes the form of packets of data. The request travels from a client through multiple tiers of servers on the network and returns as a reply. Along the way, each component processes the incoming request. Processing consumes local resources such as CPU time and disk reads and writes. Each component then transfers the request down the line of servers to a final database server. The database server retrieves the requested data and generates a reply in the form of data packets. The reply travels back through each server returning to the client to complete the business function. In addition, each component may generate other requests to other components resident on other servers on the network.
In order to maintain and improve system performance, system managers must understand the performance of the applications running on the system and be able to identify and predict current and future performance problems, and evaluate potential solutions to those problems. The performance of the system is measured by analyzing production data. Production data consists of performance metrics such as percent CPU usage for a given period of time, the number of accesses to a hard drive memory or the number of bytes of data transmitted through the network.
In the prior art, it is known to use discrete event simulators to aid in the analysis of network production data. A discrete event simulator is a software tool that is used to develop a detailed model of a multi-tiered system and applications developed on that system. One discrete event simulator known in the art is sold under the trademark “IPS” and is available from HyPerformix, Inc. of Austin, Tex.
FIG. 1 depicts how a prior art discrete event simulator is used in the system analysis. Resource utilization data 160 is derived from a set of deployed servers or a system under test 155. A discrete event model generator 165 is then used to create a discrete event model 170 of the processes running on the deployed servers or system under test 155. The discrete event model consists of a transaction flow, a simulation of resource consumption for each server, and the size of the data message received and sent during the operation. The discrete event model approximates and summarizes enterprise application transactions distributed over the network. In the prior art, the discrete event models have required very detailed analysis of production data requiring a time consuming process of defining transaction paths for many requests made by different applications simultaneously. A prior art simulation model typically takes weeks of time to complete.
The discrete event model forms a set of instructions to the discrete event simulator 170 used to simulate the execution of the business function. The discrete event simulation can then be analyzed and observed to perform basic capacity planning analysis for the network. CPU and disk behavior can be estimated as well as multi-tiered system behavior. By changing the model and reprogramming the simulator, predictions can be developed for future system load performance and planning.
Production data may be collected by software applications known as resource monitors. Examples of resource monitors include Tivoli ITM, available from Tivoli, HP Measureware, available from Hewlett Packard of Palo Alto, Calif. and BMC PatrolPerform available from BMC.
The type of data collected by the resource monitors and the frequency of collection differ. For example, disk input/output data is often collected in terms of total numbers of reads and writes, total read bytes and total write bytes that occur during the monitoring period. CPU usage is usually collected in percentage of usage over time. The data is typically bulky with data files which are tens of megabytes to multiple gigabytes in size; the data can come from many sources. Another example is performance monitoring programs that collect data directly from various hardware devices such as CPU's and hard disk drives. The data is typically not isomorphic; that is the data can have many formats. It is not uncommon to have multiple production data files that are logically related. For instance, they may capture activity on different network segments which overlap. The files must be merged and synchronized in order to be useful.
In the prior art, the format in which typical resource monitors collect data hinders the modeling process. Data preparation analysis tools have been developed to aid in this process, however they do not support the ability for modeling a business functions utilizing more than one application nor do other modeling methodologies provide for modeling a business function running on more than one server.
Also in the prior art, the discrete element models developed have been far too detailed to allow analysis of production data to derive helpful predictions quickly.
In a particular piece of prior art to Abu, et al., U.S. Pat. No. 6,560,569 B1, a system is disclosed which is an input module, a construction module, performance metrics module and an output module to create an output several models of a proposed information design system. The input module receives descriptive input which is validated and transformed into quantitative output. This construction model uses the quantitative input and information from a library of hardware and software component models to create and calibrate one or more models. The performance metrics module calculates performance metrics for the modules, which can then be compared based on these metrics. However, the method is extremely time intensive requiring iterations at several points to verify and correct deficiencies in models created. Additionally, the method requires a database of component models which were designed by the information system designer which must determine the function of each subcomponent of each system before developing the models.
Therefore a need exists for analyzing and preparing production data quickly to allow for performance modeling and analysis of a network and for efficient production of discrete event models for discrete event simulations.