Clusters are groups of computers that use groups of redundant computing resources in order to provide continued service when individual system components fail. More specifically, clusters eliminate single points of failure by providing multiple servers, multiple network connections, redundant data storage, etc. Clustering systems are often combined with storage management products that provide additional useful features, such as journaling file systems, logical volume management, multipath input/output (I/O) functionality, etc.
In a high-availability clustering system, the failure of a server (or of a specific computing resource used thereby such as a network adapter, storage device, etc.) is detected, and the application that was being run on the failed server is automatically restarted on another computing system. This process is called “failover.” The high-availability clustering system can also detect the failure of the application itself, and make it failover the application to another node. In effect, the high-availability clustering system monitors applications, the servers the applications run on, and the resources used by the applications, to ensure that the applications remain highly available.
Virtualization of computing devices can be employed in high availability clustering and in other contexts. One or more virtual machines (VMs or guests) can be instantiated at a software level on physical computers (host computers or hosts), such that each VM runs its own operating system instance. Just as software applications, including enterprise-level applications such as databases, ecommerce engines and web servers, can be run on physical computers, so too can these applications be run on virtual machines. VMs can be deployed such that applications being monitored by the high-availability clustering system run on and are failed over between VMs, as opposed to physical servers. An application being provided with high availability can be run on a virtual machine which is in turn running on a host in a high-availability cluster. The virtual machine provides desired mobility and isolation of the application, whereas the underlying high-availability cluster provides highly available computing infrastructure. In some virtualization scenarios, the host itself is in the form of a VM (i.e., a virtual host) running on another (e.g., physical) host.
In some virtualization scenarios, a software component often called a hypervisor can act as an interface between the VMs and the host operating system for some or all of the functions of the VMs. In other virtualization implementations, there is no underlying host operating system running on the physical, host computer. In those situations, the hypervisor acts as an interface between the VMs and the hardware of the host computer, in effect functioning as the host operating system, on top of which the VMs run. Even where a host operating system is present, the hypervisor sometimes interfaces directly with the hardware for certain services.
Contemporary business applications are rarely islands unto themselves, but instead are usually part of a multi-tier application stack. For example a solution for providing ecommerce application services might require three separate applications: a database, an ecommerce engine and a web service. Not so long ago it would have been standard to deploy these three components on a single server. As datacenters have evolved, there has been a move away from the single server model, in order to provide greater scalability and more flexibility at lower cost. Because different tiers of the business application service have different requirements, it is desirable to run the multiple tiers on multiple servers (either virtual, physical or a combination of the two), sometimes under different operating systems, using different virtualization platforms and/or according to different configurations as desired. In the above ecommerce application service example, the web service application, the ecommerce application and the database application could all be run on a separate virtual or physical server under different a operating system, using different levels of virtualization provided by different virtualization platforms, and with different resource requirements as desired. In effect, the different tiers of the application service are separated, and as such can be implemented within entirely different environments.
Not only can local servers or other components within a given a datacenter fail, but disastrous events can also cause the failure of an entire datacenter. For this reason, some high availability clustering and storage systems extend into wide-area clusters that support failover between separate clusters located at physically disparate datacenters (this can be thought of as a production site and a disaster recovery site, although in practice sometimes more than two physical sites are involved). Communication is established between the cluster at the production site and the one at the disaster recovery site over a network, the groups of resources used by the supported applications are maintained on both clusters, and data from the production cluster is replicated to the secondary cluster. Thus, not only can an individual application be failed over between servers within a cluster in response to a local server failure or similar event, but applications can be failed over between clusters in the event of a datacenter-level failure.
Where a disaster recovery site is used to replicate a production site on which high availability applications are run, it is important to validate that a wide-area failover between sites will be successful when needed. To do so, the applications are brought up on the disaster recovery site from time to time, to test the disaster recovery configuration and ensure that if the production site fails, the applications can be successfully failed over to the disaster recovery site. Such a level of testing is called a fire drill.
Conventionally, while such a test is being run on the disaster recovery site, the disaster recovery site should not bring up the applications on the same physical channel as primary site. This is because both the production and disaster recovery sites are connected to the same physical channel, and bringing up the applications on the disaster recovery site while the applications are running on the production site will result in DNS and IP address conflicts, as well as cross talk between the copies of the applications running on the different sites. This problem is exacerbated in the case of multi-tiered applications which run on multiple hosts. Furthermore, conventional solutions for testing an application on a disaster recovery site while the application is running on the production site are limited to specific virtualization providers (e.g., VMware, Hyper-V, etc.). In a multi-tiered scenario, different virtualization products can be used on different tiers, and thus fire drill solutions that are not agnostic to the virtualization environment are not adequate in the case of a multi-tiered application that uses multiple virtualization platforms. Further complexity is added by the fact that different tiers of a multi-tiered application run on different (physical and/or virtual) hosts, whereas conventional fire drill solutions do not support failover of applications distributed across multiple hosts.
It would be desirable to address these issues.