1. Field of the Invention
Embodiments of the present invention generally relate to cluster management and, more particularly, to a method and apparatus for proactively monitoring application health data to achieve workload management and high availability.
2. Description of the Related Art
A computing environment may include a computer cluster (e.g., a plurality of client computers coupled to a server computer or a plurality of server computers (i.e., a peer-to-peer)) that hosts multiple applications. The applications generally depend on system resources, such as software resources (e.g., operating system (OS), device drivers and/or the like), hardware resources (e.g., storage resources, processors and/or the like) and thus provide services to the client computers. In addition, the system resources are shared dynamically across the various applications in the computing environment. In operation, dependencies on the system resources may introduce a failure, which affects application performance within the cluster. For example, on occurrence of a failure and/or non-availability of computer resources, the particular application may become non-responsive and/or terminate.
Generally, a system administrator of the computing environment desires that the applications run continuously and/or uninterruptedly. For example, the system administrator monitors a status of the particular application using clustering software and/or health monitoring software. However, the status does not indicate application health of the particular application. Hence, the clustering software and/or the health monitoring software are not cluster-aware and cannot provide corrective measures that leverage a clustering infrastructure to ensure that the particular application is performing optimally.
Currently, the clustering software and/or the health monitoring software do not ascertain a cause of the failures in the clustering environment. If a failure did occur, the clustering software employs a single static (i.e., pre-defined) priority list of nodes to which a particular application can failover. However, such a static list does not account for a cause of the failure. As such, the static priority list may indicate a target node that is also affected by the failure and therefore, not suitable for operating the particular application. For example, the static priority list may indicate that the particular application is to failover to a Node 1, a Node 2 or a Node 3 in the stated order. Furthermore, the Node 1 and the Node 2 share a router of which the Node 3 does not use. If the particular application is operating on the Node 1 as the router fails, the particular application will failover to the Node 2 even the failure most likely affects Node 2 as well as Node 1 but does not affect the Node 3. As such, Node 3 is a better choice but is not selected because the static priority list does not account for the cause of the failure.
Accordingly, there is a need in the art for a method and apparatus for proactively monitoring application health data to achieve workload management and high availability.