This invention generally relates to computer system recovery in the case of faults or errors, and more specifically, to system recovery in a virtualized environment.
Virtual computing environments are quickly being adopted by many enterprises for a variety of data processing and storage needs. A virtual computing environment refers to a computer system in which a single physical machine may be observed as multiple virtual machines, and where a set of physical hardware resources can be used as multiple virtual resources. Each virtual machine can run its own operating system that may control the set of virtual hardware resources.
An important issue when designing a virtual computing environment is to provide for data backup and system recovery. One common way to address this issue is to use the operating system on the physical server. This approach has several disadvantages, however. For instance, with this approach, backup agents may be required on the operating system; and in many situations, each application, or each of several applications, running on the server, may need its own separate backup agent.
Backup agents are processes running on the Operating System and may require resources of the central processing unit. In a virtual environment in which, for example, ten virtual machines are running on one physical machine, the backup agents may require a significant portion of the resources of the CPU.
In order to have a fast recovery from a virtual machine fault or error, a second computer—a standby computer—may be kept in the same state as the working virtual machine. As soon as the virtual machine incurs a fault or error, the standby computer takes over. This provides a very fast recovery, but doubles the system requirements by requiring one standby system for every working system.