1. Field of the Invention
One or more embodiments of the present invention relate generally to computational systems and, more particularly, to techniques for providing and managing highly-available systems.
2. Description of the Related Art
A wide range of redundancy techniques have been employed in highly-available systems. In general, such techniques seek to replicate hardware, systems, components, subsystems or even software so that, in the event of a failure, relevant functionality may be maintained or at least quickly recovered. Redundancy may be provided at any of a variety of levels. For example, in information storage or transmission, it is common to manage redundant storage or transmission using error correcting codes (ECC), cyclic redundancy checks (CRC) and/or storage array technologies such as RAID (“Reliable Array of Inexpensive Disks) or as often deployed in storage attached network (SAN) architectures. Redundant subsystems such as power supplies or storage controllers are often employed to improve system availability.
In some fault-tolerant designs, fully redundant replicated hardware is employed at all of levels and duplicate (and ostensibly identical) computations are executed on the replicated hardware so that computations may continue uninterrupted at least in the event of any single failure. However, the increased complexity of such systems has often made them practical only for the most mission-critical applications.
Clustering techniques, though not always deployed strictly for purposes of availability improvement, have long been employed to allow two or more computers together in such a way that they behave like a single computer. In general, clustering can be used for parallel processing, load balancing or fault tolerance. Some tightly coupled clustering techniques (e.g., techniques employing shared boot disks and memory under control of an operating system that coordinates operations of the several nodes) date back at least to the days of VAX cluster systems popularized by Digital Equipment Corporation. More recently, loosely coupled architectures have gained popularity. Typically, clustering software is employed in such systems to distribute load or coordinate failover amongst largely independent computer systems. Systems such as the Veritas™ Cluster Server available from Symantec Corporation are typical. Operating system- or application-level cluster technology has been deployed in various releases of Microsoft™ Windows operating systems and Microsoft™ SQL Server software available from Microsoft Corporation.
In recent years, virtualization technology (e.g., as implemented in products such as those of VMware, Inc.) has presented new challenges for high-availability systems as more and more virtual servers are run concurrently on a single physical server. As a result, clustering techniques have been adapted to server virtualization. Veritas™ Cluster Server for VMware® ESX Server™ is one example of such adaptation and Microsoft has proposed simple 2-node clusters of Windows operating system instances using Microsoft Virtual Server 2005.