1. Technical Field
The present invention relates to a method and system for determining the availability of applications, and in particular to a technique for determining the availability of applications in a multi-tier environment having redundant clusters of servers within each tier, and for isolating faults to the software processes impacting availability.
2. Related Art
Two conventional techniques exist to solve the problem of identifying, in complex applications running over a number of nodes or tiers and involving redundant clusters of nodes within the same tier, that a failure has occurred, the software process or hardware device responsible for the failure, and the application transactions impacted by the failure.
The first conventional technique involves component monitors that monitor software processes or hardware devices at an individual component level. For example, commercial component monitors are available for WebSphere® Application Server (WAS) (e.g., Introscope® and Tivoli® Monitoring for Web Infrastructure), and WebSphere® MQSeries® (MQ) (e.g., Tivoli® Monitoring for Business Integration and Omegamon® for MQ). WAS, WebSphere® MQSeries®, Tivoli® Monitoring for Business Integration, and Omegamon® for MQ are available from International Business Machines Corporation of Armonk, N.Y. Introscope® is available from Wily Technology, Inc. of Brisbane, Calif. In cases such as a UNIX server running on the Lightweight Directory Access Protocol (LDAP), customized component monitors are developed. Component monitors provide performance information about software components and detect some classes of software errors; however, when a software hang occurs, these monitors provide a “false positive” (i.e., the application is not available, but a failure is not detected). Further, component monitors provide inadequate or no information regarding which application transactions are impacted as a result of a failure.
The second conventional technique involves executing a series of synthetic transactions against a real production system to see whether the transactions produce a response that corresponds to a valid known state. This synthetic transaction technique suffers from a number of problems. First, synthetic transactions are not appropriate for all business applications (e.g., updating a bank balance). Second, once a failure is detected by the synthetic transaction technique, it is not easy to determine which node or software process is responsible for the failure. Third, when load balancing technologies direct transactions, it is difficult for the synthetic transaction technique to direct synthetic transactions to specific nodes to provide complete coverage of an infrastructure. Fourth, every distinct application architecture needs to have a synthetic transaction defined for it. Finally, because of all of the above, running synthetic transactions creates a substantial load.
Thus, there exists a need in the art to overcome the deficiencies and limitations described above.