High-availability clusters (also known as HA clusters or failover clusters) are groups of computers that support running server applications with a minimum of downtime. A high-availability cluster uses groups of redundant computing resources in order to provide continued service when individual system components fail. More specifically, high-availability clusters eliminate single points of failure by providing multiple servers, multiple network connections, redundant data storage, etc.
Absent clustering, if a server running a particular application fails, the application would be unavailable until the server is restored. In a high-availability clustering system, the failure of a server (or of a specific computing resource used thereby such as a network adapter, storage device, etc.) is detected, and the application that was being run on the failed server is automatically restarted on another computing system (i.e., another node of the cluster). This process is called “failover.” As part of this process, high availability clustering software can configure the node to which the application is being moved, for example mounting a filesystem used by the application, configuring network hardware, starting supporting applications, etc. The high availability clustering system can also detect the failure of the application itself. In this case, a given number of attempts to restart the application are typically made, and if these restart attempts fail, the application is failed over to another node. In effect, the high availability clustering system monitors applications, the servers the applications run on, and the resources used by the applications, to ensure that the applications remain highly available.
Virtualization of computing devices can be employed in high availability clustering and in other contexts. One or more virtual machines (VMs or guests) can be instantiated at a software level on physical computers (host computers or hosts), such that each VM runs its own operating system instance. Just as software applications, including server applications such as databases, enterprise management solutions and e-commerce websites, can be run on physical computers, so too can these applications be run on virtual machines. A high availability cluster of VMs can be built, in which the applications being monitored by the high availability clustering system run on and are failed over between VMs, as opposed to physical servers. In other words, the nodes of a high availability cluster can be in the form of VMs.
In some virtualization scenarios, a software component often called a hypervisor can act as an interface between the guests and the host operating system for some or all of the functions of the guests. In other virtualization implementations, there is no underlying host operating system running on the physical, host computer. In those situations, the hypervisor acts as an interface between the guests and the hardware of the host computer, in effect functioning as the host operating system, on top of which the guests run. Even where a host operating system is present, the hypervisor sometimes interfaces directly with the hardware for certain services. In some virtualization scenarios, the host itself is in the form of a guest (i.e., a virtual host) running on another host. The services described herein as being performed by a hypervisor are, under certain virtualization scenarios, performed by a component with a different name, such as “supervisor virtual machine,” “virtual machine manager (VMM),” “service partition,” or “domain 0 (dom0).” The name used to denote the component(s) performing specific functionality is not important.
Although conventional clustering solutions supporting failover allow high availability clusters of VMs, clustering solutions such as Veritas Cluster Server (VCS) and Microsoft Cluster Server (MSCS) are not conventionally compatible with certain features of virtualization environments such as VMware vSphere. For example, VMware vSphere provides features such as moving VMs between physical servers without downtime (called VMotion), moving VM disk files across shared storage arrays (called Storage VMotion), dynamic resource scheduling (called DRS), snapshots and high availability for virtual machines in the virtualized environment when an underlying hypervisor fails (called VMware HA).
These features of VMware are not conventionally compatible with high availability clusters. In a high availability clustering environment, storage is shared by all the nodes of the cluster, such that the shared storage looks the same to each node. When failing over an application between nodes, the application can only be restarted on a node that has access to its associated data. Therefore, high availability clusters utilize storage that is shared across all the nodes, so that applications can be failed over between nodes of the cluster. Additionally, a cluster volume manager extends volume management across the multiple nodes of a cluster, such that each node recognizes the same logical volume layout, and the same state of all volume resources at all nodes. Under cluster volume management, any changes made to volume configuration from any node in the cluster are recognized by all the nodes of the cluster.
Under a virtualization environment such as VMware, a virtual disk can only be attached to one VM at a time. VMware does not implement the extension of the above described features (e.g., VMotion, Storage VMotion, DRS, snaphsots and VMware HA) to support cluster level shared storage and cluster volume management when virtual storage under a VMware environment is used as shared storage in a high availability cluster of VMs. As a result, an administrator is faced with deciding between these VMware features (e.g., VMotion, Storage VMotion, DRS, snaphsots and VMware HA) or the use of a high availability clustering solution such as VCS.
It would be desirable to address this issue.