1. Field of the Invention
The present invention relates to the field of high availability and more particularly to nodal failover handling in a high availability network architecture.
2. Description of the Related Art
High availability also relates to the allocation of computing resources to ensure reliability in a computing architecture. In this regard, high availability systems support mission critical application logic—even at the expense of high performance—in order to ensure the availability of a computing system during a given measured period. To achieve high availability, redundant computing resources are assigned to replace allocated computing resources in a failover mode so as to ensure availability of application logic irrespective of any failure conditions which may arise.
Clustered computing systems embody a type of network architecture supporting high availability. In clustered environments, a cluster of nodes support a single computing mission whereas a lead node normally handles the computing mission while the remaining auxiliary nodes remain in waiting for a failover condition arising in the lead node. During failover, an auxiliary node can be assigned responsibility to continue handling the computing mission so as to relieve the failed lead node. In this regard, the auxiliary node becomes the lead node. To the extent that multiple auxiliary nodes support the lead node in a failover condition, a policy can determine which of the auxiliary nodes should become the lead node during a failover condition.
Managing a high availability computing architecture can be a daunting task—particularly when directing a transition of lead node responsibility from one node to another in a failover condition. At present, centralized management and control is preferred both for detecting a failover condition in a lead node and also in assigning the lead node responsibility to an auxiliary node. For instance, in U.S. Pat. No. 7,139,930 to Mashayekhi et al. for FAILOVER SYSTEM AND METHOD FOR CLUSTER ENVIRONMENT, the determination and management of a failover condition is performed centrally for all nodes in a cluster. Likewise, in U.S. Pat. No. 6,961,768 to Davis et al. for STATUS POLLING FAILOVER OF DEVICES IN A DISTRIBUTED NETWORK MANAGEMENT HIERARCHY, a central controller detects and manages a failover condition in a high availability network architecture.
Centralized management of a failover condition in a high availability architecture can be effective in a tightly controlled environment of limited geographic scope. In the modern distributed computing environment, however, centralized management of a failover condition is not feasible due to the random addition and removal of nodes in a distributed cluster, and the presence of security enforcement points inhibiting the penetration of a centralized controller into a particular node. Peer-to-peer techniques for detecting and managing failover conditions further fail in a distributed cluster for the same reasons.