1. Field of the Invention
Embodiments of the present invention generally relate to application process monitoring systems, and more particularly, to a method and apparatus for detecting the failure of an application process.
2. Description of the Related Art
Modern computer networks generally comprise a plurality of user computers connected to one another and to a computer server via a communication network. To provide redundancy and high availability of the information in applications that are executed upon the computer server, multiple computer servers may be arranged in a cluster, i.e., forming a server cluster. Such server clusters are available under the trademark VERITAS CLUSTER SERVER from Veritas Software Corporation at Mountain View, Calif. In a server cluster, a plurality of servers communicate with one another to facilitate failover redundancy such that when software or hardware, i.e., computer resources, become inoperative on one server, another server can quickly execute the same software that was running on the inoperative server substantially without interruption. As such, user services that are supported by a server cluster would not be substantially impacted by inoperative server or software.
High Availability (HA) is the accessibility of resources in a computer system in the event of a software component failure within the system. In existing HA software, a high availability daemon (HAD) frequently monitors an application process to verify its “on-line” or operational status. This monitoring process is periodic and can be configured by adjusting a monitoring frequency parameter. Thus, the maximum amount of monitoring time required to detect the failure of an application process is equal to the time interval of the monitoring cycle. Once the HAD determines that an application failure has occurred, a failover of the application can be initiated, i.e., the application can be restarted on another or same server. In order to reduce the time of application failure detection, and thus improve the monitoring process, the monitoring frequency may be increased. However, this frequency of monitoring cycles places a burden on the central processing unit (CPU) of the server.
Thus, there is a need in the art for a more efficient method for detecting an application process failure.