Conventionally, an information processing system with a plurality of servers has a failover capability with which, when any server has a fault, another server takes over the operation that is performed by the faulty server so as to prevent the service from being stopped.
An explanation is given below, with reference to FIGS. 18A to 18D, of an example of the failover capability. FIG. 18A is a diagram that illustrates a failover capability in an Active-passive mode. For instance, in the example illustrated in FIG. 18A, an information processing system includes an operating server #1 that provides a service A and a service B and includes a standby server #2 that does not provide any services. In this information processing system, if an error occurs in the server #1, the server #2 takes over the provision of the service A and the service B, whereby each of the services is continuously provided.
FIG. 18B is a diagram that illustrates a failover capability in an Active-Active mode. In the example illustrated in FIG. 18B, the information processing system includes the server #1 that provides the service A and the server #2 that provides the service B. In this information processing system, if an error occurs in the server #1, the server #2 takes over the provision of the service A, whereby each of the services is continuously provided.
FIG. 18C is a diagram that illustrates a failover capability in an N-to-1 mode. In the example illustrated in FIG. 18C, the information processing system includes the server #1 that provides the service A, the server #2 that provides the service B, a server #3 that provides a service C, and a standby server #4. In this information processing system, if an error occurs in the server #1, the standby server #4 takes over the provision of the service A, whereby each of the services is continuously provided.
FIG. 18D is a diagram that illustrates a ring-type failover capability. In the example illustrated in FIG. 18D, the information processing system includes the server #1 that provides the service A, the server #2 that provides the service B, the server #3 that provides the service C, and the server #4 that provides a service D. In this information processing system, if an error occurs in any of the servers #1 to #4, a different server that is specified in a ring form takes over the service that is provided by the server where the error occurs, whereby each of the services is continuously provided. For example, in the information processing system, if an error occurs in the server #1, the server #2 takes over the provision of the service A, whereby each of the services is continuously provided.
Furthermore, there is a known technology of injecting a simulated fault into a server that is included in an information processing system so as to perform a test to check whether the failover capability applied to the information processing system is properly implemented. For example, there is a known technology in which a simulated fault is generated in a server that is included in the information processing system by using a jig, such as a clip, or software for generating a simulated fault, and a test is performed to check whether the failover capability is properly implemented.
An explanation is given below, with reference to FIG. 19, of an example of the technology for performing a test to check whether the failover capability is properly implemented. FIG. 19 is an example of a flowchart that illustrates the steps of a cluster failover test. In the example illustrated in FIG. 19, an explanation is given of a case where a user performs a test to check whether a plurality of failover capabilities is properly implemented.
For example, a user designs a cluster of servers that are the subjects for a test (Step S1). Next, the user prepares for the servers that constitute the designed cluster (Step S2) and installs an OS (Operation System) in each of the servers (Step S3).
The user then makes settings for the failover capability and the service to be implemented by each of the servers (Step S4). The user then uses a jig or software to inject a simulated fault into a server so as to generate an error (Step S5). Afterward, the user checks whether the set failover capability is properly implemented (Step S6).
Next, the user performs a failback operation to restore the status of the information processing system to the pre-error status (Step S7). Furthermore, the user determines whether a test has been performed on all of the failover capabilities (Step S8) and, if a test has been performed on all of the failover capabilities (Yes at Step S8), the test for the failover capability is terminated.
Here, in some cases, damage occurs in data that is stored in the HDD (Hard Disk Drive) included in the server due to the type of injected fault, the OS is not properly operated and, as a result, a failback operation is not performed. Therefore, if a test has not been performed on all of the failover capabilities (No at Step S8), the user determines whether damage occurs in the data stored in the HDD of each of the servers (Step S9).
If damage occurs in the data stored in the HDD of any of the servers (Yes at Step S9), the user returns to Step S3 so as to reinstall the OS (Step S3). Conversely, if damage does not occur in the data stored in the HDD of each of the servers (No at Step S9), the user returns to Step S5 so as to inject a simulated fault into the server and generate an error (Step S5). As to the specific details of technologies as described about, see Japanese Laid-open Patent Publication No. 2000-057108, Japanese Laid-open Patent Publication No. 56-021253, and Japanese Laid-open Patent Publication No. 07-262101, for example.
However, in the technology of injecting a fault by using a jig, software, or the like, if damage occurs in the data that is stored in the HDD of any of the servers, the OS is reinstalled; therefore, there is a problem in that the failover capability is not continuously tested.
For example, the OS data is sometimes damaged if the power source of a server is forcibly turned off while data is written in the HDD included in the server or if uncorrectable data is injected into the HDD. In such a case, as the user does not perform a failback operation properly, the user reinstalls the OS. As a result, the user does not continuously test the failover capability.