Distributed applications have become increasingly popular in the last years, particularly following the widespread diffusion of the Internet. In a distributed application, client computers access resources managed by server computers through a network. A typical example is that of an e-business application, wherein a user may download a login page, fill-in a form with his/her username and password, and then receive information (for example, about a personal bank account) from the server.
Tools for monitoring performance of distributed applications play a key role in their management. Particularly, a system administrator can get instantaneous notification when a user is experiencing any problem (so that appropriate steps can be taken to remedy the situation); alternatively, the collected information can be logged and accurate counts tracked over time. For example, the information provided by the monitoring performance tools is essential for service level agreements or for threshold and/or availability monitoring; moreover, the same information is very useful to measure workloads for capacity planning and charge-back accounting.
However, these tools (like any measurement systems) inevitably interfere with the quantities under measure; therefore, the correct tuning of the performance monitoring tools is of the utmost importance, in order to avoid adversely affecting operation of the whole system.
A solution known in the art for monitoring performance of distributed applications is provided by the Application Response Measurement (ARM) standard, as described in “The Application Response Measurement (ARM) API, Version 2”, Mark W. Johmson, Tivoli Systems, December 1997. The standard defines some API calls, which can be used to ask an agent to measure transactions and to make the information available to management applications. In this way, an accurate picture of the actual workload of the system can be obtained.
The ARM standard also supports the use of correlators, which provide child/parent information needed to trace how transactions and corresponding sub-transactions relate to each other. The correlators are very useful to breakdown the complexity of the distributed application, so as to facilitate the analysis of the collected information. For example, when a transaction is slow it is possible to know which sub-transaction(s) contribute most to the delays.
A distributed application must be correctly instrumented for monitoring its performance using the ARM standard. First of all, this procedure requires the identification of the key transactions to be monitored. The distributed application is then modified by embedding calls to the ARM APIs where necessary.
However, the solution described above is very rigid since the key transactions must be defined statically. The cited document only suggests a technique for exploiting the format of the correlators so as to use the tracing selectively (for example, when the response time of a client begins to be unacceptable) However, once the calls to the ARM APIs have been inserted into the distributed application, no way is provided for controlling the transactions to be monitored dynamically; conversely, any change requires the updating of the corresponding source code and its deployment to the different (client and server) computers where the distributed application is running.
Therefore, a wrong selection of the key transactions can be detrimental to the operation of the whole system. Particularly, when few transactions are selected the collected information may be useless; conversely, monitoring a great number of transactions may result in application delays and system overhead.
Moreover, the instrumentation of the distributed application is not a tenable option when its source code is not available. This drawback is particular acute for pre-loaded or packaged applications; a typical example is that of the browsers that are installed on millions of clients for accessing the Internet.
A different solution for monitoring performance of distributed applications where source code changes are not possible is described in “Service management using the application response measurement API without application source code modification”, Martin Haworth, Resource and Performance Management Solutions Network and System Management Division, Hewlett-Packard Company, June 1997. This article proposes capturing a script by recording the user actions on the client, for example, by means of a Remote Terminal Emulation (RTE) technique. The user script is edited to include calls to the ARM APIs for each desired transaction. The user script can then be scheduled to run at an appropriate interval.
However, the proposed technique only provides an emulation scenario, wherein the performance parameters that are measured are always artificial in nature. Therefore, this solution is very limited in that it cannot provide an accurate picture of the performance of the real transactions.