The performance of large computer networks and servers and the distributed applications run on them is an area of considerable interest to the global economy as businesses become more diverse and applications more complex. In order for network systems to remain reliable and available, system performance must be constantly monitored and tested. Additionally, maintaining performance during expansion of a network or the deployment of new servers and applications can be a considerable task.
Modern software applications are characterized by multiple components residing on multiple clients and servers or “tiers” connected by a network. Often a single network can support thousands of clients and servers and be widely geographically dispersed. These networks are known as “multi-tiered systems”. In many cases, a multi-tiered system includes use of the Internet to request and receive data for enterprise applications.
An enterprise application typically provides a variety of business functions that users may execute. For example, an online banking application may allow a user to access databases at a bank and manipulate data to check account status or transfer funds between accounts. The user's task is known as a business function.
When a business function is executed, a sequence of transactions is performed by the enterprise application operating on the components on the network. Each transaction consists of a request for data (or “stimulus”) and a reply. The request takes the form of packets of data. The request travels from a client through multiple tiers of servers on the network and returns as a reply. Along the way, each component processes the incoming request. Processing can comprise a series of “loops” between servers requiring multiple “visits” to each server to process a single request. Processing consumes local resources such as CPU time and disk reads and writes. Each component then transfers the request down the line of servers to a final database server. The database server retrieves the requested data and generates a reply in the form of data packets. The reply travels back through each server returning to the client to complete the business function. In addition, each component may generate other requests to other components resident on other servers on the network.
In order to maintain and improve system performance, system managers must understand the performance of the applications running on the system and be able to identify and predict current and future performance problems, and evaluate potential solutions to those problems. The performance of the system can be measured by analyzing test data generated by automated load testing software. During a network load test the intent is to drive the resource utilization (CPU and I/O) up to measurable levels in order to compute the cost of the resources used by a business function. Examples of load testing software are “Loadrunner”, available from Mercury Interactive and “Silktest”, available from Seque. Test data consists of performance metrics such as percent CPU usage for a given period of time, the number of accesses to a hard drive memory or the number of bytes of data transmitted through the network.
The performance of a system can also be measured by the analysis of web log data and network flow trace data.
Web log data or throughput data is collected by web servers which typically log each HTTP command to the server. The typical web log reports the IP address of the machine making the request, the time of the request, the HTTP command and the size of the request from reply messages. A typical web log report can be generated by Microsoft Internet Information Services available from Microsoft Corporation and incorporated into Windows 2000 and Windows Server 2003.
The performance of a system can also be measured by analyzing network traces, available from packet sniffers, Ethernet sniffers or network or protocol analyzers. The packet sniffer captures each packet traveling along a computer network and decodes and analyzes its content according to the appropriate request for comments (RFC) documents or the specifications. Depending on the network structure, the packet sniffer can detect all or part of the traffic from a single machine operating within the network. However, there are methods which allow sniffers to operate in “promiscuous mode” to detect everything on the network at the node or computer to which it is attached. When attached to a local area network, the packet sniffer is connected to a monitoring port and mirrors all packets passing through all ports of the switch.
In the prior art, it is known to use discrete event simulators to aid in the analysis of network data. A discrete event simulator is a software tool that is used to develop a detailed model of a multi-tiered system and applications developed on that system. One discrete event simulator known in the art is sold under the trademark “IPS” and is available from HyPerformix, Inc. of Austin, Tex.
It is also known in the prior art for discrete event simulators to use network flow trace data to produce transaction summaries through automated network data analysis. The transaction summary contains a listing of the network flow of a business function. The network flow is the path of the transactions required to complete a business function including the number of visits to each server, the size of the request being made and the size of the returning reply. However, the transaction summary only contains network information and transaction flow information.
For example, Table 1 shows a transaction summary listing a business function name, a visit count, (equivalent to the number of “bounce” a message makes between servers) requester identity, replier identity, request size and reply size.
TABLE 1VisitRequestReplyCPUReadReadWriteWriteBusiness FunctionRequestorReplierCountSizeSizeTimeCountSizeCountSizeBF_GetStatementClientWeb14862502??????????BF_GetStatementWebApp25424023??????????BF_GetStatementAppDB422538??????????
The transaction summary also contains information related to the flow of a transaction between and among servers on the network such as shown in Table 2. Those skilled in the art will recognize that transaction flow can be much more complicated than the example shown in Table 2, including non-linear examples where branching of multiple threads is required. The prior art transaction summaries however, do not provide a summary of resource information.
TABLE 2ServerNameResource consumption to be simulatedClientSend 486 bytes request to web serverWebConsume ?? seconds of CPUWebPerform ?? disk read operations, reading ?? bytes each timeWebPerform ?? disk write operation, writing ?? bytes each timeRepeat2 timesWebSend 542 byte request to application serverAppConsume ?? seconds of CPUAppPerform ?? disk read operations, reading ?? bytes each timeAppPerform ?? disk write operations, writing ?? bytes each timeRepeat2 timesAppSend 22 byte request to database serverDBConsume ?? seconds of CPUDBPerform ?? disk read operations, reading ?? bytes each timeDBPerform ?? disk write operations, writing ?? bytes each timeDBSend 538 byte reply to application serverAppSend 4023 byte reply to web serverWebSend 2502 byte reply to client
The transaction summary also contains a HTTP map of business function names, HTTP patterns and pattern types as shown below in Table 3.
TABLE 3Business FunctionHttp PatternPattern TypeBF_GetStatementGET /bankapp/index.php.* HTTPregexpGetStyleGET /bankapp/style.csstextLoginGET /fmstocks7/ HTTPtextView_PortfolioGET /fmstocks7/Portfolio.* HTTPregexpLogoutGET /fmstocks7/Logout.aspx HTTPtext
The pattern type can be “text” which indicates a simple text comparison is required to identify the HTTP command in the web log. The pattern type can also be “regexp” which is understood as a “regular expression” requiring pattern matching of the HTTP command in the web log. An HTTP map can be used to identify executions of a business function in a web log.
FIG. 1 depicts how a prior art discrete event simulator is used in the system analysis. Network flow trace data 157 is derived from a set of deployed servers or a system under test 155. Web log data 159 is collected from web servers as web log reports. Resource utilization data 160 is also collected from a set of deployed resource monitors on a system under test 155. A discrete event model generator 165 is then used to create a discrete event model 170 of the processes running on the deployed servers or system under test 155. The discrete event model consists of a transaction flow, a simulation of resource consumption for each server, and the size of the data message received and sent during the operation. The discrete event model approximates and summarizes enterprise application transactions distributed over the network. In the prior art, the discrete event models have required very detailed analysis of production data requiring a time consuming process of defining transaction paths for many requests made by different applications simultaneously. A prior art simulation model typically takes weeks of time to complete.
The discrete event model forms a set of instructions to the discrete event simulator 175 used to simulate the execution of the business function. The discrete event simulation can then be analyzed and observed to perform basic capacity planning analysis for the network. CPU and disk behavior can be estimated as well as multi-tiered system behavior. By changing the model and reprogramming the simulator, predictions can be developed for future system load performance and planning.
Network test data may be collected by software applications known as resource monitors. Examples of resource monitors include Tivoli I™, available from Tivoli, HP Measureware, available from Hewlett Packard of Palo Alto, Calif. and BMC PatrolPerform available from BMC.
The type of data collected by the resource monitors and the frequency of collection differ. For example, disk input/output data is often collected in terms of total numbers of reads and writes, total read bytes and total write bytes that occur during the monitoring period. CPU usage is usually collected in percentage of usage over time. The data is typically bulky with data files which are tens of megabytes to multiple gigabytes in size; the data can come from many sources. Another example is performance monitoring programs that collect data directly from various hardware devices such as CPU's and hard disk drives. The data is typically not isomorphic; that is the data can have many formats. It is not uncommon to have multiple production data files that are logically related. For instance, they may capture activity on different network segments which overlap. The files must be merged and synchronized in order to be useful.
In the prior art, the format in which typical resource monitors collect data hinders the modeling process. Data preparation analysis tools have been developed to aid in this process, however they do not support the ability for modeling a business functions utilizing more than one application nor do other modeling methodologies provide for modeling a business function running on more than one server.
Also in the prior art, the discrete element models developed have been far too detailed to allow analysis of production data to derive helpful predictions quickly.
In a particular piece of prior art to Abu, et al., U.S. Pat. No. 6,560,569 B1, a system is disclosed which is an input module, a construction module, performance metrics module and an output module to create and output several models of a proposed information design system. The input module receives descriptive input which is validated and transformed into quantitative output. This construction model uses the quantitative input and information from a library of hardware and software component models to create and calibrate one or more models. The performance metrics module calculates performance metrics for the modules, which can then be compared based on these metrics. However, the method is extremely time intensive requiring iterations at several points to verify and correct deficiencies in models created. Additionally, the method requires a database of component models which were designed by the information system designer which must determine the function of each subcomponent of each system before developing the models.
Therefore a need exists for analyzing and preparing production data quickly to allow for performance modeling and analysis of a network and for combining resource data, transaction summaries and web log data to complete efficient production of discrete event models of business functions for discrete event simulations.