1. Field of the Invention
Embodiments of the invention generally relate to redundant computer systems. More specifically, this disclosure relates to a method and apparatus for passive process monitoring, such as, monitoring the termination of a process in a clustered computer system.
2. Description of the Related Art
Computer systems and their components are subject to various failures. These failures are generally related to devices, resources, applications, or the like. Many different approaches to fault-tolerant computing are known in the art. Fault tolerance is the ability of a system to continue to perform its functions, even when one or more components of the system have failed. Fault-tolerant computing is typically based on replication of components (i.e., redundancy) and ensuring for equivalent operation between the components. Fault-tolerant systems are typically implemented by replicating hardware and/or software (generally referred to as resources), such as providing pairs of servers, one primary and one secondary. Such a redundant system is often referred to as a server cluster, clustered computer system, clustered environment, or the like. A server in a clustered environment is generally referred to as a node or cluster node. The failover of resources in the clustered system is handled by clustering software that is distributed among the cluster nodes.
The clustering software typically comprises cluster agents distributed throughout the clustered system (e.g., each node may include a cluster agent). The cluster agents monitor events associated with resources in the clustered environment for the purpose of detecting failures and initiating failovers. Conventionally, the cluster agents actively poll the operating systems managing the resources, at specified intervals, e.g. every 60 seconds, for the current state of these resources. This active, periodic polling consumes additional computing resources, such as, memory and processor resources. The consumed resources are then not available for use by other applications in the system. On systems with a large number of monitored resources, the polling mechanism can consume a relatively large percentage of computing resources.
Therefore, there exists a need in the art for a method and apparatus that provides a resource monitoring system that functions passively without the need to actively poll for events related to the resources.