1. Field of the Invention
Embodiments of the present invention generally relate to a cluster management and, more particularly, to a method and apparatus for providing high availability to service groups using robust nodes.
2. Description of the Related Art
In a typical computing environment, an organization may employ a number of technologies to process, store, protect, recover, produce and/or secure mission critical data. For example, the organization may employ one or more virtualization techniques to create one or more abstract computer resources (e.g., virtual machines, virtual applications, virtual desktops, virtual hardware devices and/or the like) from physical computer resources. Moreover, the typical computing environment may include one or more nodes (e.g., computer systems) that provide application services to one or more client computers. For example, a datacenter (i.e., a cluster) may include several physical machines and/or virtual machines that manage one or more database systems (e.g., ORACLE Databases).
Datacenter application services are normally distributed and interdependent. In order to ensure high availability of such application services, the datacenter configures one or more nodes to be redundant systems. When a current system fails, the application services are restarted on a target redundant system. As such, the failover process may be seamless or require a minimal of amount of downtime. A client producing mission critical data using the applications services may not even notice that current system failed. Accordingly, the datacenter allocates various computer resources (e.g., memory, processor cycles and/or the like) at the redundant system to running the application.
Current high availability solutions (e.g., VERTIAS Cluster Server One) and related components (e.g., Policy Master, Agents and/or the like) utilize various target selection algorithms neither of which identify and/or report single points of failure (SPOFs) within a computer network. In other words, the current high availability solutions for datacenters are limited to a perspective that the computer network is a black box. Furthermore, the various target selection algorithms are unable to distinguish between a particular redundant system that is behind a single point of failure and an identical redundant system that is more robust and devoid of SPOFs. Selecting the particular redundant system behind the single point of failure renders client computers vulnerable to failures. Consequently, a failure within the redundant system leads to downtime and a loss in productivity for an organization.
Therefore, there is a need in the art for a method and apparatus for providing high availability to service groups within a datacenter.