High-availability clusters (also known as HA clusters or failover clusters) are groups of computers that support running server applications with a minimum of down-time. A high-availability cluster uses groups of redundant computing resources in order to provide continued service when individual system components fail. More specifically, high-availability clusters eliminate single points of failure by providing multiple servers, multiple network connections, redundant data storage, etc.
Absent clustering, if a server running a particular application fails, the application would be unavailable until the server is restored. In high-availability clustering, the failure of a server (or of a specific computing resource used thereby such as a network adapter, storage device, etc.) is detected, and the application that was being run on the failed server is automatically restarted on another computing system (i.e., another node of the cluster). This process is called “failover.” As part of this process, high availability clustering software can configure the node to which the application is being moved, for example mounting a filesystem used by the application, configuring network hardware, starting supporting applications, etc.
Virtualization of computing devices can be employed in high availability clustering and in other contexts. Operating system level virtualization is a virtualization method in which a single instance of an operating system with a single kernel supports multiple, isolated user-space level execution environments, each of which can be used to run a separate application. There are a number of scenarios in which it could be desirable to run multiple, isolated execution spaces within a single instance of an operating system, for example to isolate server applications with different security needs or required system configurations.
Different operating systems support operating system level virtualization, and use different names to describe this functionality. For example, isolated user space instances on a single instance of an operating system are known as zones under Solaris, jails under various Linux based operating systems and WPARs under AIX. The generic term “container” is also sometimes used to denote an isolated user space instance. For consistency and readability, the term “container” will be used herein to denote an isolated user space instance running under any supporting operating system. It is to be understood that where the term “container” is used herein, the term refers to isolated user space instances on a single instance of an operating system generally, including those with other names, and those running under operating systems other than Solaris, Linux and AIX.
It is to be understood that operating system level virtualization, in which multiple isolated containers run on a single instance of an operating system, is distinct from system level virtualization. In system level virtualization, one or more virtual machines (VMs or guests) can be instantiated at a software level on a physical computer (host computer or host), such that each VM runs its own operating system instance.
Just as server level software applications such as databases, enterprise management solutions and e-commerce websites can be run on physical computers, so too can server applications be run on operating system level containers or system level virtual machines. In order to provide an application with high availability, the application can be run on a container or virtual machine which is in turn running on a high-availability cluster.
Conventional clustering solutions allow failover between physical computers and/or between system level VMs, as well as between containers running on different operating system instances. However, existing HA clusters do not support failover between clusters running on a single operating system instance, or between a container and the global user space of the operating system instance on which the container is running. It would be desirable to address this shortcoming of conventional clustering systems for a number of reasons. For example, running multiple operating system instances requires the utilization of additional hardware and software resources. Additionally, failing over applications between operating system instances imposes a time overhead and administrative burden.