1. Technical Field
The present invention relates generally to providing high availability of resources within a managed network. More specifically, the present invention is directed toward providing a high availability mechanism that is capable of operating in cooperation with telecommunications equipment management software running on an Operations, Administration, Maintenance, and Procedures (OAM&P) processor complex.
2. Description of Related Art
The management of a computer network is not a simple task. Today's networks are complex beasts. As organizations move more and more toward high-connectivity, large networks containing a wide variety of hardware and software systems connected in bewildering topologies have begun to emerge. As networks become more complex, their upkeep becomes increasingly difficult. In the telecommunications domain Operations, Administration, Maintenance and Procedures (OAM&P) systems are software and hardware systems designed to assist network support personnel in the management of such network elements.
An OAM&P system will typically include what is known as a telecommunications equipment management (TEM) subsystem. An TEM subsystem monitors the state of network equipment and handles equipment provisioning for field replaceable units (FRUs). Field replaceable units are units of equipment that can be replaced in the event of a failure.
While TEM assists human support personnel in handling equipment failures, in mission-critical applications, such as telephone communications, waiting for a support person to take care of a problem may be unacceptable. High-availability (HA) systems address this need by providing “failover” of failed resources. “Failover” means automatically switching from the failed resource to a backup or redundant resource. A “resource,” in this context, may be a hardware component or software component—essentially anything that is capable of failing.
CLUSTER SERVER™, produced by Veritas Software Corporation of Mountain View, Calif., is one example of an HA system that is commercially available. CLUSTER SERVER™ monitors groups of resources controlled by “clusters” of computer systems. In the event of a failure in a resource, CLUSTER SERVER™ can deactivate the resource and replace it with another “backup” resource (i.e., it performs a failover of the resource). CLUSTER SERVER™ is capable of monitoring a number of disparate resources concurrently and is sensitive to dependencies between resources. If necessary, CLUSTER SERVER™ can deactivate multiple resources in the correct order, when dependencies between the resources require it.
CLUSTER SERVER™ and HA systems, in general, may overlap in their responsibilities with TEM systems. Because both HA systems and TEM systems monitor the status of network resources and take action in response to the status of those resources, conflicts may arise between an HA system and TEM system operating on the same network. For example, when a resource is being removed from service using the TEM system and unbeknownst to the HA system, the HA system may attempt an unwanted failover.
A need exists, therefore, for a system that can provide configurable HA features, while cooperating with existing TEM systems to avoid conflicts.