1. Field of the Invention
Embodiments of the disclosure relate in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to selectively monitoring transactions in a cluster computing environment.
2. Description of the Related Art
Monitoring the performance and availability of software applications, including those that may be spread across multiple physical systems and involve multiple physical resources, is a key task in system administration. This monitoring is typically performed by instrumenting software to include additional instructions, generally referred to as probes, to report performance information such as application response time. Performance monitoring may also be implemented by adding a software component, sometimes referred to as a plug-in, to the application. The plug-in is then invoked in-line with the application during the execution of a transaction. Regardless of how it is implemented, performance monitoring needs to take place in real-time to be useful. As a result, any such monitoring causes some degree of run-time performance overhead on the systems being monitored. Accordingly, there is a need for control mechanisms that provide an adequate degree of granularity when configuring performance monitoring activities.
Existing approaches include selectively turning performance monitoring on or off based on individual applications or logic components. For example, when a user initiates a transaction from a Web browser, the hypertext transfer protocol (HTTP) request is sent to a Web server, which in turn may make a call to an application server and a database server. Traditionally, the entire application running respectively on the Web server, the application server, and the database server would be monitored in order to determine the root causality of a performance problem. However, there are two major drawbacks to this methodology. First, when transaction monitoring is enabled for an application, all business transactions in the application are monitored, regardless of whether they are relevant to identification of the performance problem. This incurs more overhead in terms of CPU usage, memory, etc. than necessary to solve the problem. Second, when transaction monitoring is enabled for an application, every transaction in the application will generate additional monitoring information at the same level. The volume of accumulated data can become very high within a short period of time, thus incurring additional processing overhead. The high volume of data can also obscure the root cause of the problem due to the sheer quantity of irrelevant data the user has to review.
Another approach, currently used in the IBM Tivoli Monitor for Transaction Performance, associates each transaction with a token that embeds the entire monitoring configuration that should be used for the transaction. Each instrumented application has an entry point (e.g., a uniform resource locator, or URL) for monitoring each transaction (e.g., an HTTP request from a browser). Once these entry points are defined for the application, monitoring policies are associated with those entry points. The monitoring policy is represented as a token, which embeds all information necessary to monitor the transaction. However, no other control mechanism is provided for deciding when the transaction should be monitored, other than a predefined sampling rate. This can be problematic as anomalies in the system could occur during transactions that are not being monitored. In a yet another approach, an application monitoring policy is implemented that includes a description of the server resources that are to be monitored, along with limits that each resource should not exceed. Example of server resources that might be monitored include the amount of free virtual memory on a given server, processor utilization, current thread pool size, etc. If the application monitoring policy indicates that the transaction should be monitored, then any resource thresholds defined in the policy are checked. If no resource thresholds have been exceeded at the instant that the transaction arrived at the server, then the transaction is not monitored. If a predetermined resource threshold has been exceeded, then the transaction could potentially experience a performance degradation and should be monitored. Once a decision is made to monitor the transaction, the instrumentation probes monitor the transaction from that point forward.
However, none of these solutions provide adequate granularity when the monitored systems are part of a clustered environment. A cluster of application servers typically has a load balancer in front of two or more application servers. The load balancer determines which back-end server should receive a current inbound transaction. Approaches that monitor a transaction only when a specific system resource is above or below some predefined threshold, as described above, do not work very well in clustered environments. In order to monitor only those back-end servers that are experiencing system resource issues, the transaction performance monitor needs to know in advance which of the back-end servers the current transaction is to be routed. For example, if an application server cluster has three back-end application servers and a load balancer on the front end, an incoming transaction could be routed to any of the three back-end application servers. If the goal is to monitor the transaction only when a resource threshold has been exceeded on one of these three back-end systems, the application monitoring policy needs to know which of the back-end servers the transaction will be routed to know whether or not it should be monitored. Otherwise, the transaction would have to be monitored on each back-end server regardless of the server's current resource usage.