Currently, apparatuses such as a security gateway, a load balancer, a network device operation control apparatus are configured by combining a plurality of servers. An apparatus configured by combining a plurality of servers typically has high reliability such as durability of service. Nowadays, corporate activity largely depends on machines, and a potential loss from a stop of a machine system is significant, which has prompted a call for high reliability. In particular, the popularization of the Internet and the emergence of new services such as a moving image service are increasing the traffic of the Internet, and apparatuses are expected to economically construct and operate a communication system at high speed that can execute high-speed processing and enable various services.
Among these apparatuses, apparatuses provided at positions that are supposed to offer smooth service, such as a data center and a carrier, especially a server system, might adopt a redundant configuration in which redundant apparatuses and servers are included in addition to apparatuses that are actually operated, in order to reduce a period of time for which communication stops during maintenance work or upon occurrence of a failure.
As one of methods for realizing a redundant configuration, a server system may have an N+1 redundant configuration, in which a standby server is prepared in addition to N servers in use. By adopting the N+1 redundant configuration, the reliability of the server system, especially the availability of the server system, can be improved.
For example, a blade system is known in which a plurality of central processing unit (CPU) blades are stored in a chassis, a virtual machine (VM) system realized by a virtual machine method is operated in each CPU blade, and a given server stands by as a redundant server, in order to achieve load leveling and avoid a stop of the system due to a failure. In such a system, by migrating a virtual system from a server in use to a redundant server when a failure or the like has occurred, smooth system operation can be realized.
In addition, a system is known in which, when a system operating in a certain CPU blade, that is, for example, a virtual system, is to be migrated to another CPU blade, the destination CPU blade is not limited to a given one and an optimal blade system can be selected while taking into consideration the characteristics of blades, the operation states of fans, power supplies, and the like, a failure occurrence condition, and the like. In such a system, a redundant blade does not have to be prepared in advance. Furthermore, by digitizing the system operation condition of each CPU blade and configuring the system such that a CPU blade including the same or better operation condition as or than an original CPU blade is selected when a failure has occurred in the original CPU blade, it is possible to reduce the time taken to complete the migration after the occurrence of the failure.
In addition, a system is known in which servers in use configured by a plurality of physical servers on which a plurality of virtual machines realized by the virtual machine method can operate and a single standby server on which virtual machines operate are included, and when a failure has occurred in one of the physical servers in use, an operating system (OS) that has operated in the physical server is activated as a standby virtual server, or when a failure has occurred in one of the physical servers on which one of the virtual machines operates, a virtual OS of the virtual machine is activated as a standby virtual server. In a process for recovering from a failure executed by this system, when a failure event has occurred in one of the servers in use, an activation disk used by the server in which the failure event has occurred is assigned to the standby server, and the power of the standby server is turned on.
In addition, a method for updating a file is known that, in a virtual server condition in which a plurality of VM systems operate in a single physical server as servers and duplex operation is realized by one of blades that operates as a server in use and another blade that operates as a standby server, does not affect another virtual server operating on the same physical server when the operation is switched from one blade to another blade.
In general, when a plurality of virtual servers have been constructed on a physical server, all the virtual servers operating on the physical server stop if a failure occurs in the physical server, and therefore there has been a problem in that it is difficult to achieve high reliability for the system. On the other hand, when a plurality of independent servers configure a server system, the entirety of the system does not stop even if a failure occurs in a single physical server, but there has been a problem in that cost is large.
In addition, when a virtual server is used as a server in use on a physical server, there has been a problem in that there is overhead because input-output (I/O) operations of guest OSs are executed parallel to one another.
In addition, when a standby server operates in a cold standby state during the normal operation in a pair of physical servers including a duplex configuration including a server in use and the standby server, communication and service stop for an extended period of time after a failure occurs because an application for communication is activated after configuration information is transferred to the standby server. In addition, there has been a problem in that it takes time to begin the operation of the standby server after the occurrence of the failure.
Furthermore, in a method for realizing an N+1 redundant configuration according to the related art, cold standby is conducted after a failure occurs or each application is supposed to be able to support the N+1 redundant configuration. However, although increasingly varying applications may support a hot-standby duplex configuration, there has been a difficulty in that it costs a lot to support an host-standby N+1 duplex configuration.
In addition, although a method is possible in which a standby server corresponds to N servers in use, it is difficult in terms of cost to incorporate into varying communication applications a code that takes into consideration a redundant configuration for supporting an N+1 duplex configuration. This also decreases the reliability of the standby server.
Therefore, in the N+1 duplex configuration of a server system including a plurality of servers, a reliable redundant computer system that can suppress cost without sacrificing CPU performance and I/O performance is expected.
In addition, in order to improve reliability, a method for switching the operation, especially a method for recovering from a failure, is expected for a server system including an N+1 redundant configuration including a plurality of servers in a single chassis.
Japanese Laid-open Patent Publication Nos. 2008-276320, 2010-211819, and 2010-003022 are examples of related art.