The present invention relates to a failure recovery in a virtual server environment.
In computer systems and data centers of enterprises, an increase in the number of servers owned thereby has resulted in increased operation and management costs. A solution to this problem is a server virtualization technology. The server virtualization technology enables a plurality of virtual servers to operate on a single physical server. The physical server includes resources such as a processor and a memory, which are divided by the server virtualization technology for assignment to different virtual servers such that a plurality of virtual servers are simultaneously executed on the single physical server. The need for the server virtualization technology has increased because of higher performance of processors and lower cost of resources such as memories.
On the other hand, a higher reliability is increasingly required for systems. This is because a larger dependency of an enterprise system on a computer can cause larger damages if the system fails. Generally, a system is improved in reliability by providing an active server and a standby server such that the active server is replaced with the standby server if the former fails.
From the tendency of pursuing two requirements for server virtualization and higher reliability, it seems quite natural to come into request for a high reliability maintained even in a virtualized server environment. However, these two aspects have characteristics which conflict with each other. For example, when a plurality of virtual servers are built on a physical server, a failure in the physical server, if any, would cause all the active virtual servers residing thereon to stop simultaneously. If a system is composed of a plurality of independent servers, a failure in a single physical server will affect in a small range, whereas a failure would affect in a wide range in the virtualization technology which can concentrate a plurality of virtual servers on a single physical server. For this reason, the reliability tends to be lower in virtualized environments. Also, from a viewpoint of reliability, it is contemplated to provide a plurality of virtual servers such that a failed server is replaced with another server. However, this solution requires a number of servers, a licensing fee for software on spare servers, and the like, thus leading to an increased cost.
JP-A-2005-173751 discloses a master system operation management method which provides virtual machines, one of which serves as a master system, and another of which serves as a standby system for the master system, wherein data is synchronized between both the systems to mutually switch the systems on a periodic basis. JP-A-2005-173751 describes that this method can provide a rapid support even for the switching triggered by a failure.
JP-A-2001-216171 discloses a virtual machine system which comprises a plurality of virtual machines that are built under a host operating system (OS) running on a physical machine, wherein one of the virtual machines is kept suspended as a standby system, and in the event of a failure in the active system, a memory image of the virtual machines is transferred to the standby system in order to activate the virtual machine which serves as the standby system in a shorter time.